Getting Started with DSE Search

DSE Version: 6.0

Video

Transcript: 
Hi there all! I am Amanda from the Developer Advocate team and I am here today to talk with you about DataStax Enterprise Search more specifically we are going to be learning all about “Search Queries”. This is what you are going to be doing everyday and how your going to get value from your data, so let’s learn together and get this rock-solid so you can become the next Search Super Star.



First, let walk through the basics on how to actually get started with DSE Search.
How do you get DSE Search up and running? Well honestly it only takes two steps.
Number 1: Enable the Search workload on all your DSE Nodes
Number 2: Create a search index for the table you want to use Search with!
Sounds pretty easy right!  But let’s talk about each one of these steps in much more detail



Let’s tackle the first to-do on our list.
Let’s get our cluster going with the DSE search workload!



Your installation type will determine the way you configure and enable DSE to run with search.



If you have installed DSE with the package installation type you will need to change a configuration in /etc/defaults/dse. Open the file and search for SOLR_ENABLED, this should be set to 0 by default. Change this to equal 1. Note this will have to be done on every node in the cluster.



This is also good time to remind you, that all nodes in the cluster (or datacenter) must run the same workload. You can’t mix them, because it may not perform as expected. All nodes will need to be enabled for search (or in combination with analytics and graph). It is common for an entire cluster to run with the search workload, however in more complex clusters with multiple datacenters, all the nodes in the same datacenter should be running the same workload. 
Okay, back to installation!
If you have installed DSE with the tarball installation type, you do not need to edit any configuration files. Simple start your cluster with “dse cassandra -s” 
Okay, we are up and running!
What actually gets started with we start with DSE Search?
Aside from DSE Core, you have the following technologies included that will combined together to create the  DSE Search functionality.
You will have:
Apache Tomcat
Apache Solr
And Apache Lucene 
Let’s take a quick minute to introduce each one of these at very high-level so you really know what’s going on under the hood.
Apache Tomcat: is an open source created Java web server. This will enable you to be able to access the Apache Solr  administrator web UI. You can use this for monitoring, troubleshooting, configuration management, and even running queries.
Apache Solr: is an open source enterprise search platform, which gives us the core of the search capabilities we will be utilizing with DSE search.
Apache Lucene: is an open source information gathering library. This is the core indexing and search library that Apache Solr extends out.



Now that we have configured and started DSE search on our cluster. Let’s check to make sure that DSE Search is actually enabled.



Using dsetool status on anyone of our nodes in our datacenter will show us the current running workload. Nodes running Search will either display Workload: Search or Workload: Search Analytics if you also have analytics enabled.



This is a good time to note an interesting warning message! If you are using a mixed workload in your cluster the workload in that case will be shown as mixed, but the actual workload will be shown (be it search, analytics, graph, etc) will be shows as a part of one warning messages. For example a warning would show that states: Warning: datacetner “DC1” has a heterogenous workload [Search, Analytics], which may lead to degradation of workload specific features. If you do see this warning, make sure you have your cluster configured correctly.



Alright! We are ready for the second item on our to-do list! Creating a search index on the table you would like to use Search with.
The Search functionality is enabled and ready to use when a search index is created on the table.
You can identify the search index  by using the keyspace and table name. Using our KillR video reference application as an example that would be : killrvideo.videos
Only one search index needs to be created per table. When a search index is created in this way, each column is given its own index. You can also specify by column and only add one column or even Multiple columns. We are no longer tied to only using our primary key and clustering columns to filter data! We have tons more flexibility.
A small side note, a search index is called a core in Apache Solr terminology, but through out this course we will refer to this as a search index. 
The Search Index has two components
Number 1: The Schema
This determines what columns to index and how they are indexed
Number 2: The config
These are the setting for the search components, configure values, and resource usage. These are found in XML style configuration file.
Once the index has been created the existing data in the table will be read and indexed.
The index is immediately useable when created, but may not show expected results until the full table (indexing potentially millions of rows that will be spanning many nodes!) indexing has completed.



Now that we have a clear understanding of the different parts of a search index, now we get to see the actual CQL code to create it!



By using CQL to create the search index 4 tasks will begin:
  1. The command runs on all search nodes in the datacenter -- a parallel process on each node is fired up to create this index on each node
  2. The default search schema and configuration setting are set up automatically
  3. The existing table data is indexed
  4. New information will be automatically indexed as it’s written to the data center.

  To index all columns in the table we just need a very simple CQL statement:

Create search index on “your” keyspace.table;



As always there are tons of options to allow for the exact funcality you want and need. Some other options to be aware of are:

Columns:

Profiles:



Config:
And of course “Options:
Columns allows for specifying columns to the index and schema options. Any column not included in this list is excluded by default, expect the PRIMARY key columns which must be indexed.
Profiles has options that can help minimix the size of the index
Config a way to set values for the search index config. This will override values contained in the actual config file.
Option specific options for the create index operations
To index only certain columns in your CQL create statement you can use the option WITH COLUMNS.
An example would be: Create SEARCH INDEX on your keyspace.table WITH COLUMNS < and the list of columns to include>
There are tons more documentation on these additional option on the DataStax Enterprise documentation site. Also more details about these options will be discussed later in this course.



Okay! It’s time for my favorite part of our DataStax Academy courses. The Hands on exercises.
Here in Exercise 2 you will be able to walk through a DataStax Studio notebook that will guide you on how to enable DSE search on your node or cluster, restart, create a search index on your table, view that table schema, and verify that your index was added correctly by running a sample query!
Good luck! See you in the next lesson!
No write up.
No Exercises.
No FAQs.
No resources.
Comments are closed.