Search Index Configuration

DSE Version: 6.0

Video

Transcript: 

Hi! Joe Chu here and welcome to Search Index Configuration. In this video we'll be going over this rather mysterious component of the search index; seeing what it looks like and what it is able to do.

The search index configuration is where all of the functionality for Solr is defined. Where the schema defines what to index and how it is indexed, the config pretty much defines everything else. This includes some things that we've already like enabling features such as live indexing, or changing values for different properties like the auto soft commit max time or the filter cache high and low watermarks.

What's also important about the configuration is that this is where the request handlers and components are defined, which is where functionality for the standard search query, and more specialized search queries such as facet search, morelikethis suggestions, and highlighting are set up.

There are two ways that you can make changes to the solr config. The one that we've been using previously is to use the ALTER command in CQL. It is also possible to manually edit the xml that represents the config and upload that to DSE. Editing might be hard to do if you do not have the xml, but that can easily be retrieved by either using the DESCRIBE command in cqlsh, or by using the dsetool command-line tool. You can see more details about the latter tool in an upcoming video.

Afterwards, the search index will need to be reloaded of course for the changes to take effect. Since the config does not affect the index itself, you would not need to have that rebuilt.

What we see here is just a snippet of the automatically generated configuration XML, that is displayed by either the CQL DESCRIBE command or with dsetool. It can look a bit intimidating considering that we can't even show the entire output on the screen at once, but hopefully you won't find it so bad after going through the individual parts.

In fact, the configuration output generated by DSE is actually very different from the default configuration file you would see in Apache Solr, which pretty much lists every single setting that can be changed, and also contains comments for the settings that can be fairly verbose.

With the config generated by DSE, only the most often used settings are included. Settings that do not show up in the configuration will just use the default settings. Of course if you change the config to add back missing settings, then DSE will naturally use the settings that you specify, provided that they are still used and compatible with DSE Search.

We'll be quickly taking a look at the elements that you'll see in the generated config.

The first two are the lucene match version and the dse type mapping version, which help specify what version of DSE the configuration XML should be used for. These two elements really should not be changed at all.

The directoryFactory determines how and where indexes are stored, and really should only be modified if enabling or disabling encryption for the search index files saved to disk.

The indexConfig element is an important section in the config since it controls the behavior the index writers and how they work. The two notable settings in this section is the rt setting, which enables and disables live indexing, and the merge settings. Although the merge settings can be used to tune how index merges work, it is recommended to avoid making changes to these. One other notable setting that used to be here is the ramBufferSizeMB setting, which controlled the size of the ram buffer. The setting has been deprecated in DSE 6, and the ram buffer size is now controlled automatically.

The jmx element essentially enables Java Management Extensions for the search index, and makes available the JMX mbeans that allows users to pull statistics about how DSE Search is running.

The updateHandler defines how index updates are done and includes settings like the autosoftcommit max time and autosoftcommit max docs, both which can be adjusted to control how often soft commits will take place. If live indexing is enabled for the search index, only max time would be relevant and will instead determine how soon index updates become visible for reading.

The query element section determines how DSE Search process and respond to queries. Among the settings you see here are the filterCache which you can adjust to control when the filter cache will start to evict entries and how much of it to remove.

There is also the setting enableLazyLoading, which will only load fields for searching when needed. Having this enabled can help to boost query performance if only a small subset of the indexed fields are commonly used.

The useColdSearcher setting determines whether search requests will be able to run immediately when a indexsearcher reopens the index following a soft commit. If false, any search request will block until the search had a chance to warm up its caches.

The setting maxWarmingSearchers specify the number of searchers that may be warming up at any given time. There will be more searchers warming up as the frequency of soft commits increases, and if the number of searchers trying to warm up exceeds the max, DSE Search will start to return errors.

The requestDispatcher determines how the Solr HTTP API responds to specific requests. Among the settings that are visible in the generated config are the requestparser attribute enableRemoteStreaming and multipartuploadlimitinkb, which determines if http streaming can be used to upload data to DSE and how much large that data can be.

The attribute never304 is part of the httpcaching element, which determines if DSE Search can use HTTP 304 Not Modified responses. This is useful to set to true when working on development on with DSE Search, and running queries may return this response instead of the expected query result.

The request handlers are used to receive and process requests for DSE Search. There can be multiple request handlers for a search index, and can have different functions as determined by the class used by the request handler. The handler will also have a name, which determines how requests can be sent to a specific request handler.

There are several request handlers that are included in the automatically generated search index configuration, which include handlers for search queries using HTTP and CQL, a handler for deleting data via HTTP, and others used in the Solr Admin UI.

The searchhandler is the request handler used to receive and process search queries coming from the HTTP API.  Some behavior of the search handler can be modified by setting values for the defaults settings, which include defType, rows, q.op, and default field.

There is also an appends list which can contain filter queries that will always run alongside search queries sent to the search handler.

Another request handler is the cqlsearchhandler which supports search queries that are sent using CQL. The cqlsearchhandler has the same defaults and append properties as the search handler, with the exception of the rows property which can be simulated using the CQL LIMIT clause. The CqlSearchHandler is also registered with the following components: the cqlquerycomponent, the facetcomponent, and the debugcomponent, which are needed to support the search functionality available in CQL.

The update request handler accepts and processes updates to the index through the HTTP API. This historically would allow for any type of write request, which would be able to the changes to data in the Cassandra table just like an insert, update or delete. However in DSE 6, only deletes are supported and inserting or updating data should be done exclusively through CQL.

Components defines how the search handler processes search queries, with each component providing different functionality. The components listed here are registered by default with the search handler in the auto-generated config, and can be used without any further configuration changes.  Of course the relevant parameters used by each component would need to be provided in the search query in order to receive a response.

The cql search handler does not necessarily use all of the components here, as functionality depends on what is available and implemented on the CQL side.

In addition to the default components, there are additional components that can be useful. This includes the spellcheckcomponent, the queryelevationcomponet, and the suggestcomponent. As mentioned before about some of the other components, these components here are only usable with a search handler using the HTTP API.

Having come through the configuration and the different elements, it should now be easier to visualize how to make changes to the configuration when using the ALTER SEARCH INDEX command. The element path should be used to help traverse elements nested with each other so that you can make changes to the appropriate settings. If can be hard to work with initially, but it does get easier with practice. Don't forget that there are also shortcuts you can use to directly edit settings without having to map out element path as well.

If nothing else, remember that you can always edit the XML directly and upload it to DSE using dsetool.

Let's take a look at some examples. At the top here are the ALTER commandsto add some default parameters to the cqlsearchhandler. Due to the complexity of editing the various parts, it is often easier to drop the entire request handler and then make your changes all at once with one command, rather than altering individual settings one by one.

Here we dropped the cqlsearchhandler, and then recreated it with new defaults using a json map. The default parser is change to edismax, the default field used for searching is set to the title field, and we also add a filter query to the appends list so that all search queries will only search in documents where the mpaa_rating is set to G.

The below snippet shows the modifications made to the XML.

In this example, we'll define a new search component which uses the suggestcomponent class. The particular component does require certain properties to be configured in order for the component to function, which is all added and shown here. You'll want to take a look at the Solr reference guide if you'd like to learn more about these components and how to configure them.

Finally we need to define a request handler to make use of our new component. Here we create a new request handler called suggest, and also add some defaults to be used for suggest queries. Basically we set the suggest parameter to true, so that users do not need to specify that everytime they send a query, and we limit the number of suggestions to just 10.

Wow, that was a lot of information to cover. Try digesting some of the concepts that we've covered in the video by doing this exercise.

Well, I think that's enough lecturing for now, try out this exercise before you move on the next video.  





































 

No write up.
No Exercises.
No FAQs.
No resources.
Comments are closed.