Term Modifier

DSE Version: 6.0

Video

Transcript: 

Hello, and welcome to Term Modifiers.

In this video, we'll be taking a look at some other powerful searches that can be performed, many of these which will do not require an exact match, but rather can return results that are close, or similar to what you are searching for. We'll also be looking at a search that can prioritize certain terms to show up in the results.

The first search we'll be discussing is the wildcard search. This is a way to find documents that has terms that matches a given pattern. If you're coming from a relational database background, this is very similar to the functionality of the LIKE operator. The way it works is by searching for a term, using a wildcard somewhere in the term, either at the front, the end, or anywhere in between. There are two types of wildcards, the question mark, which matches a single character, and the asterisk, which matches zero or more characters.

In the examples below, you have one wildcard search that is looking for terms similar to question mark, a.s.t.  This matches terms like last, for the last samurai, fast, for fast & furious, and past, such as in x-men: days of future past.

The other example is searching for fr asterisk d, which matches framed, fried, and fred, as seen in the resulting movie titles.

There are some things to note about wildcard searches, one being that they do not work with phrase searches. Another is that the text analysis used for a field can affect how wildcard search matches. For example, using a TextField and StandardAnalyzer does not apply the same analysis chain to the search query, so that search terms that are capitalized will not match the indexed terms that have been filtered to be all lower-case. In this kind of situation, the exact same wildcard search we used in the previous example would actually return 0 results, simply because the schema is different for the title field.

With fuzzy search, you can search for documents that have a term that is similar to the search term. This kind of search can be enabled by adding a tilde at the end of a term, and optionally adding a value of 1 or 2. This value represents the number of single character edits that would changed the matched term to the search term.

The example below shows that a fuzzy search that will match terms similar to the term seven. The first result, the magnificent seven, is an exact match. The second term is a match because editing the number 7 in the term and changing it to a v would turn it into the search term. The last result matches because the term even can be edited to add a s, which would then match our search term. Besides replacing a character and adding a character, the other possible way to edit a term would be to remove a term, which unfortunately we do not have an example of here.

Proximity search is somewhat like fuzzy search except that it searches for similar phrases instead of terms. Like a fuzzy search, a tilde is added at the end of a phrase to enable a proximity search. An optional numeric value can specify the maximum distance the terms in the phrase can be apart from each other. In the example below there is a proximity search with an edit distance of 3, meaning that there can be at most 3 addition terms in between the original terms in the phrase.

If you've already seen the video on document boosting, then you may be aware that it is possible to manually boost the score of documents that contains a specific term. This can be done by adding a carat, followed by a boost value, at the end of a term in your search query. Aside from terms, phrases and sub-queries can also be boosted as well.

The example below shows a search query that looks for the word Gosling or Affleck in the description field, with the word Affleck being boosted by a factor of 3. This may cause movies that contains the word Affleck in the description to show up in the search results before movies with the word Gosling.

The last search that we'll be discussing here is the range search, which matches all of the values in a document field within a certain bound. This is done with the syntax shown here, where you use square brackets or curly braces to specify a range search, and then the values to bound by, where the asterisk can be used to. It is important that the TO in the range search is in capital letters, because otherwise you search query will return an error.

In our example, the range search uses the square bracket to look for all release years that includes 2000, up to 2015, with the curly brace being exclusive. This returns results for movies with release years 2000 all the way up to 2014.

No write up.
No Exercises.
No FAQs.
No resources.
Comments are closed.