Search Queries

DSE Version: 6.0

Video

Transcript: 
Now that we have learned how to setup our search indexes, we are ready to learn how to start using them with Search Queries.



As we know DSE core normally returns results from a partition or from scanning all partitions (even though scanning all partitions isn’t recommended)



We can filter our results using one or more of our clustering columns



We can also filter with non-primary key columns if we enable:

 

  1. Allow filtering
  2. Adding Cassandra secondary indexes or
  3. Adding materialized views
--Many of these ways are not recommended due to some very real performance concerns.



DSE search uses search indexes to filter results in a way that has been optimized to not heavily impact performance.



Any column that is indexed can be used in a search query. Using DSE search we now have the power to get the results we need: by filtering results that are not a part of our partition key or clustering columns.



Let’s remind ourselves how queries work on DSE core with our familiar ring artitecture.



This example shows how we can use a DSE core query to filter based on a partition key. This allows us to narrow down the query to a single replica and a single partition to read.

 
In this case our data is partitioned around the ring by video_id our query:

 

select * from killrvideo.videos where VIDEO_ID = <> will return us one value (since our video id is unique only one value be returned) which will be returned from only one node.



Let’s remind ourselves of how a query would work with DSE core when allow filtering is enabled.

 

Here we have a query that is not using the partition key to filter by (remember our partition key is on video_id).

 

Select * from killrvideo.videos where release_year = 2018.

 

What does this mean in practice. The query will have to be sent to all nodes in the cluster (or  enough nodes to cover the entire token range).

 

Every node will have to go through each and every one of its partitions and rows, and read the value for release_year to determine if it should be included in the results.



Therefore filtering the rows based on the release_year.



Once all the rows has been found, it will finally be sent back to the coordinator and then to the client.  Doing a compare on each row in the table across maybe hundreds of nodes, is probably not the best idea.



Now that we have reminded ourselves of what DSE core queries would look like in this case, let’s review how this would work with DSE search using the same query that is looking for movies based on release year.



While the query will still need to be sent to all nodes, instead of having to read each and every node DSE search will use the search index to do a check.

 

This makes it more efficient to do, with the filtering taking place at the index level and not the query level.

 

Only the rows that match the requested parameters will be read (eventually) and returned. The exact mechanism of this will be explained in the write path module.



Now that we understand why we should use DSE search indexes, how can we use them with CQL.

 

This table shows all the CQL keywords supporting Search.
Equality
Inequality
Range
Match multiples values
Like
Exists
Count
Limit Rows
And Sort



We will discuss a few of these terms in more detail here in just a bit, but first let’s review some examples.



Take a second to review each one of these examples. These will help you get a firm grasp on how to use each of the different column data types and the predicates.

 

Feel free the pause the video here, and walk through these. I will be here when you get back!



Oh good you’re back!

 
Since you are now an expert on CQL search keywords, let’s look at Date/Timestamp.

 

Date values can be queried using string formatted as YYYY-MM-DD



Timestamps are queried using a string literal or a UNIX timestamp integer. Remember Unix time is approximately the number of seconds that have passed since January 1 1970.



For timestamps, any precision beyond milliseconds is not captured in the search index, so you can only effectively query to millisecond precision.



OKAY LIKE, is like great RIGHT?



Actually, LIKE is very useful. It adds the ability to search via simple regular expressions in CQL.

 

It works only on string data types -- ASCII, TEXT and VARCHAR



You can use a percentage sign as the wildcard at the beginning , end of a string, or even both.



It is case sensitive by default but this can ultimately change based on your schema.

 
In the first example shown here, we are looking for movies titles that start with “Terminator” so hopefully we will find the movies Terminator I through Terminator 6.



In the second example we are looking for titles that end with Jedi. So we should be able to find The Last Jedi (get the joke!) and The return of the Jedi.



In the last example we are looking for titles that contain the word legend. We should be able to find “Legends of the Fall” as well as “I am Legend” and the “Anchorman: the Legend of Ron Burgundy”  



There are some limitations with CQL search using LIKE so be aware of these.



Wildcards do not work if they are not as the beginning or the end of the string -- No error will be thrown but just nothing will be returned as a result.

 
By putting at the end or beginning of your string you may end up with more results than you wanted but you will get your results, more filtering will just need to be applied.



The wildcard symbol (the percentage sign) can not itself be escaped. If your string contains a percentage sign, you won’t be able to find it with this method.



Also it should be noted that there are no single character wildcards available.



Multiple Conditions can be added to your queries.



The keyword `AND` can define multiple predicates that must all match to be included in the results.

 
Remember that while AND is available for CQL queries, OR is not available when using CQL and not when using CQL and Search.

 

To use OR you can utilize an alternate query syntax in CQL or utilize IN as it is similar to an OR if you only need to query a single column.

 

With OR luckily you can just run two seperate queries and with DSE’s outstanding performance you will have your results quickly.

 

Here is an example of utilizing AND to find a particular movie

 

Select * from videos where title = “Titanic” and release_year = 1997



Since I know there are many versions of Titanic I want the one that I know I watched over 5 times in theaters as a pre-teen. No judging.



Search allows for ORDER BY on any indexed column, not just clustering columns.



This adds the power of sorting on all our indexed columns. Be aware that Collections, User Defined Types, and tuples do not have ORDER BY support.



Even partition key columns that `do not` have ORDER BY support in standard CQL now can be sorted with DSE Search.



In DSE Core SELECT COUNT(*) can cause all kinds of issues.

 

SELECT COUNT(*) will return the number of rows in the query but without the data itself -- that's not the issue.



The issue is that this results in a FULL table scan (if using select count(*) from table) and even adding a WHERE clause will still need to scan all the rows in the partition where each row must be compared and then added to the count if it meets the condition.



This can be extremely slow, potentially cause performance issues, or just timeout without generating any kind of answer. With DSE Search this operation will utilize the power of the search index to quickly return that answer.



Now it’s time to put on our thinking caps and work on this pop-quiz!

 

Which of these queries is using DSE core vs DSE Search?

 

Hint: Remember the videos table has a PRIMARY KEY of video_id.  

 

Okay pause the video now, and take a look at these.



Okay great your back and ready for the answers!



The first and third query as you can see are using Search, because they are searching on columns title, rating, and release year -- none of those are our partition key video_id.

 
The second query will execute with just DSE core because it is only querying the partition key.

 
Remember: the other queries normally wouldn’t be able to run in DSE Core and would return a warning to use the “ALLOW Filtering” keyword .

 
Also a note to remember here even though there is a Search Index on the video_id column because we are able to search with DSE core on this column we will. The logic states “if able to search via DSE Core then do that, if not try DSE search”



Let’s go back and review some very important points around Cassandra/DSE Core or Search.

 
To Be able to Use Search I need
  1. DSE Search workload enabled.
And 2. an index created for the table being queries (more specifically on the column I am querying)

 

DSE will determine if the CQL query can be fulfilled without using Search at all.

 
If so, then will run without Search

 
Otherwise, use Search Instead.



Search will also have priority in situations where a secondary index is normally used.

 
Be aware you can also force CQL to run a query with search using a different query syntax that will be discussed in the Term Search unit.



This flowchart helps us walk through the logic of DSE Core vs a DSE Search query and which one is chose.

 
The first branching point asks “Is Search WorkLoad Enabled” we know that if it is not, then we will be using Cassandra query.



If it is enabled we come to the next branching point that asks if we have a “Search index created?”

If we don’t we are back to a normal Cassandra query.

If we do have a search index enabled, then travel to the next branching point.



This asks if the columns queried are all the columns in the partition key, if yes and no other columns are added then a regular Cassandra query will be executed.

 
If not all the partition key columns are included OR there are regular columns being queries then we will use a search query!



Whew, I think talking about it was much more complicated than studying this easy to under chart. Feel free to pause the video and take a look.



If you want to verify the path your query is taking, turn tracing on before running your query.

 
It’s as simple as adding “TRACING ON” before running your query. It should be noted here that there will be a lot of information when tracing is enabled, but you can tell if Search was used if you see the message about a response for the Slor URL. If you don’t happen to see this message then the query was ran as a standard CQL query.



Remember with search queries only 2 different consistency levels are supported.

 

One and Local_ONE.

 

If you try to use any other consistency level CQL will return an error.



There are some other CQL functionality that is not compatible with search indexes.

 
Counters -- Need to exclude counter columns when indexing
Static Columns -- These are automatically ignored
Select DISTINCT -- Must be restricted by partition key or to a static column

 

Can’t execute a CQL search query (with the syntax we have been discussing in this unit) and also the solr_query column.

 
Remember when querying the above incompatible types that do not work with search, it is still possible to do these queries with ALLOW Filtering but always make sure you understand the risks and consequences of doing so.



Alright! It’s finally time for another hands on exercise! This 3rd exercise focuses on running and practicing formulating search queries. Again it will be utilizing the awesome DataStax Studio notebooks. You will determine if you are running a DSE Core or DSE search query, run search queries using different predicates and try other CQL clauses that will work with Search!



Good luck! I’ll See you in the next lesson!
No write up.
No Exercises.
No FAQs.
No resources.
Comments are closed.