Selecting Collections and UPTs

DSE Version: 6.0

Video

Transcript: 

Welcome to Searching: Collections and User Defined Types. Let’s talk about how we can combine the power of DSE Search with the power of Composite Data Types!

Collections, UDT’s and tuples, oh my!

Some data types in DSE are just structured differently.

These include:

Collections (set, list, map)

User Defined Types

Tuples

They may use different operators for searching which we will spend some time to review.

Equality on collections will not work and is not supported. If you are trying to find a movie that have only the following genres (Comedy, Western, and Action) it will not work written as shown and will throw an error. way.  You should be able to rewrite this query utilizing the AND clause and get the same result.

A set and a list both represent multiple values but do so differently.

A set doesn’t save ordering and values are sorted in ascending order. No duplicates are allowed.

A list saves ordering where you append or prepend the value into the list. A list allows duplicates.

Contains can be used to search LIST and Set columns.

Contains is used to find matching items in a list or set column, but not maps.

It will return all rows that contain the value, even if it’s not the only value in the column.

This is also case-sensitive.

In this DSE search query, not only will we get movies returned that are strictly Comedies but you will also get Romantic Comedies and Dramedies.

The Map data type represents a collection of key-value pairs. The key for the map must be a string type, and the map values can be most data types.

A specific naming convention must be made for the keys.

  1. Each key must have the column name as a prefix to be able to search for it

  2. The keys without the prefix do not get indexed and therefore are not searchable



    Recommended naming for map column name is column-name_

In this USER table the phone number field is a MAP data type with the KEY being of type TEXT and the VALUE being of type TEXT.

Notice that the column name is phone_numberUNDERSCORE.

When inserting these values into the table they are added as phone_numberUNDERSCOREhome and phone_numberUNDERSCOREmobile

Now if we want to be able to search for a value in a key we can search on specific values for a key in a map column. But the caveat is that you need to know the key you are searching for.

As the example shows you can get all the information about a user (select *) WHERE phone_numberUNDERSCORE[‘phone_number_home’] = ‘555-555-555’ in this case I need to know my key and value to find the rest information I am looking for.

Contains KEY searches for rows that contain the specified key.

This will returns rows that contain a matching key-value pair.

This does not need to be just that one key in the map column.

For example we can do a select * from our user table where phone_numberUNDERSCORE CONTAINS KEY ‘phone_number_mobile”

This will return all USERS that have included their mobile phone number. In this day and age, it’s going to be a big return set!

Equality on User Defined Types and Tuples supports any data type within the UDT or tuple type.

But please be aware you can only do an exact match on all the fields in the UDT or tuple.

The fields not specified for filtering are treated as null, which generally will throw an error.

One exception are string data types, which treats null as an empty string.

An empty string is not the same as a field with no value (or null the absence of a value), so remember this might not find your intended matches.

It is not possible to filter on fields that are null, and it is strongly recommended that all fields be initialized with a default value.

In a separate unit we will discuss how to search by individual fields within a UDT or tuple by using solr_query.

The inequality operator will return all rows that does not match the exact value searched for.

This will show unexpected results when fields are null, again all fields should be initialized with a default values.

Here is an example query to solidify this idea.

Again to be able to search by the UDT in a CQL search query we must be looking for an exact match.

In this example query we are asking to find the names from the user table of the person or people who live at a particular address.

Here is an example of what can happen when we do not initialize our strings.

Here we are searching for the address from the user table where we want the complete address for all users in the city of Santa Clara. We get back two rows, one where the zip code is an empty string and one where the zip code is null (or undefined). Now in this second example, we are trying to match the entire address to find the address where the zip code was null. However, only the user with the zip codes set to a “blank string” is actually returned, because of the way CQL parses the null in the query.

Lesson learned: friends don’t let fiends use null!

Let’s look at an example using Tuples. You can’t see if here but by magic a tuple_column was added to our user table with 4 int values in the tuple.

In our first example query we are able to find the values because we searched for an exact match.

In for second query we will get an error, as you cannot search for null.

In our third query we will also get an error, as the fields that are not specified (the 4th field in this case) are assumed to be null. Which as we saw in the second query this cannot run.

Now you get to try your hand at Searching Collections and User Defined Types in the next exercise!

In Exercise 4 you will look for values using the CONTAINS predicate, find keys and values in a map column, and query individual fields in UDTs and tuples.

Good luck!

No write up.
No Exercises.
No FAQs.
No resources.
Comments are closed.