Using Dsetool

DSE Version: 6.0

Video

Transcript: 

Hi, I'm Joe Chu, and welcome to Using dsetool. In this video we'll be taking a look at one of the most important command-line tools in DSE, and what it can do with DSE Search.

Dsetool actually mirrors a lot of the same functions that we've seen earlier with CQL index management commands. Well, actually it's more accurate to say that the CQL commands duplicates dsetool, since dsetool has been around and used since the early days of DSE.

While I will admit that CQL is in many ways much more convenient to use, there are still plenty of things that only dsetool can do, aside from managing your search index schema and config.

In fact dsetool is the only way that you can run certain operations on individual nodes, something that you wouldn't be able to do with the CQL commands, which work on all of the nodes in a datacenter. If you are restoring data on a node and need to rebuild indexes, do you want to do so on all of your nodes, or just that one? That's where dsetool comes in.

So here is a table with all of the dsetool commands that are related to DSE Search. There are quite a few of them, but let's take a look at them one by one to see how they work.

The classic way to create a search index is by using the create_core command. This method for creating your search index can still automatically generate your schema and config for you, if you set the generateResources flag to true. Otherwise you can provide your own resource files and set the path to those files with the schema and solrconfig file. Running the command will automatically upload and store those files into DSE, and create the search index with them. That's it really. You really just need to set the flag to automatically create the schema and config, or to provide your own. Dsetool will create the search index and index existing data in all of the Search nodes in the current datacenter. In special cases when you don't want to reindex right away, or if you only want to reindex on the node you're running dsetool on, you can use the reindex and the distributed flag to change the behavior.

Here are some examples of using the create_core command. The first example is one of the simplest, which is just to create a search index on the killrvideo videos table and automatically generate the schema and config. The command will also reindex data across all search nodes in the cluster.

The second example can be used in a situation where you don't want to reindex the existing data. It will just create the search index for killrvideo videos using the automatically generated resource files.

The third example uses the distributed flag to only reindex data on the local node.

And the last example uses the schema and solrconfig flags to specify which files to use for the two search index resources. Since only a file name is provided and not the absolute path, dsetool will expect those files to be in the current working directory.

The next command will take a look at is the reload core command, which tells DSE to reload the requested search index. Reload_core doesn't require any parameters, but usually a modified schema and / or config is provided, which will automatically be used following the reload. Use the schema and solrconfig flag that we've seen previously to set the path to the modified files. Reload_core still has the reindex and distributed flags, which works the same way as we've seen before. There is also a new flag, deleteAll, that can delete the previous index prior to reindexing. If set to true, this would cause search queries to possibly return no or partial results while table data is reindexed. The default behavior though, would be to continue to use the previous index after the reload, which should be based on the schema used previously. This could potentially return "incorrect" results depending on how the schema was changed. Of course, if the schema has not changed at all, there would be no need to reindex or to delete the previous index.

The first example here shows the killrvideo.videos index being reloaded using a new schema and solrconfig. The data for the table will be reindexed, but will continue to use the old indexes while it is doing so.

The second example reloads the search index using only a new config file. No reindexing is needed.

Finally the last example shows a situation where a new schema is provided and will be used following the reload. The search index will also drop the previous indexes and reindex data again using the new schema.

The last command when it comes to modifying the search index is the unload_core command. The command will effectively delete the search index, remove the solr_query psuedo column, and disable search queries on the specified table.

However, the resource files that you have saved for the search index are still kept, unless you use the flag deleteResources. Likewise the existing index segment files are saved as well by default, but can be deleted automatically with the deleteDataDir flag. If for some reason you only want to delete the index segments on the local node, you can add the distributed parameter and set it to false to do so.

We'll only show one quick example here to delete a search index. Here the killrvideo.videos search index is being removed with both the resource files and index segments being delete across all search nodes in the local datacenter.

Another useful command is the write_resource command, which will upload a file and save it into the solr_admin.solr_resources table in the database. Afterwards, the file can be referenced by name in the specified search index. These resource files can be used in different search components, such as the text analysis filters. For example, the StopFilter, which prevents certain words from being indexed, actually requires a list of the stop words to ignore. That list can be uploaded as a resource file and then used to set up the stopfilter in the schema.

In fact, our example at the bottom is doing exactly that. It is uploading a file determined by the file parameter located in the home directory, under resources slash stopwords.txt, and will be given the name stopwords.txt in the database, due to the name parameter. Afterwards you can then setup the stop words list for a StopFilter used in the schema for the killrvideo.videos table, but using that name stopwords.txt.

The search index schema and config are also resources, and you can technically upload these as well. However, the naming of these files can be confusing since the database keeps two version, one for the active schema or config being used, and one for the pending that will be applied after a reload. To avoid using the wrong name, it is better to just upload them when running the reload_core command.

The read resource command is our way of retrieve the contents of a resource from the database. You can specify the search index name, and name of the resource you want to read in the name parameter, and the contents will be displayed to the standard output, like the first example you see here. You can also save the contents of the resource into a file by piping the output to a file like in the second example.

The commands here show ways to view the search index schema and config. With get_core_config you can read the contents of the pending configuration, and with get_core_schema, you read the contents of the pending schema. The current parameter can be used to read the active resource instead, if set to true. Like the command read_resource, these two commands will display the contents to standard output, but can be piped to a file as needed.

Finally there is a command infer_solr_schema, that will display what the automatically generated schema would look like for a table. Of course the table needs to be defined, but the command will work regardless of whether the table currently has a search index or not.

Some useful commands when it comes to indexing existing data are the stop_core_reindex and rebuild_indexes commands.The stop_core_reindex will stop any reindexing that is currently taking place for the specified search index, on the local node. However, this will cause the indexing status to switch to FAILED, and the table data will need to be reindexed again from the beginning. You would need to run this one each node that you would want to stop reindexing on.

The rebuild_indexes is more a general DSE command that rebuilds all defined secondary indexes for a CQL table. Since the search index is technically a secondary index, you can use this command to initiate reindexing for a table on the local node. Note that if for some reason you have multiple secondary indexes defined for a table, all of of them will be reindexed. If that's the case, using the CQL REBUILD SEARCH INDEX command would be a better solution.

The examples here are fairly straightforward. The first example stops the reindexing for the killrvideo.videos search index on the local node, and the second example starts a reindex for the killrvideo.videos search index on the local node as well.

The examples will also show what sort of response to expect when these commands are executed. The stop_core_reindex does return a message indicating that the reindex was stopped. However the rebuild_index command only returns a prompt and does not respond with any useful messages.

Talking about reindexing, it would also be useful to know if a search index on a node is currently reindexing, and what the progress is. This is what the core_indexing_status command is for. The output of the command will indicate that the search index is currently indexing, or that it has failed, or has been successfully completed. Normally the command will also show the progress of the reindexing as a percentage, as well as an estimated time before completion. By default the status on the local node is shown, but the all flag can be used to check the status across all search nodes.

In our first example, the command here is checking the status of the killrvideo.videos search index, which happens to return a response saying that reindexing is finished, which means that it has successfully completed.

The second command uses the progress flag to also return the percent complete and the estimated time left, if the search index happens to be currently reindexing. The response here shows that the killrvideo.videos search index is indexing, at 21% complete, and there is still about 7 minutes before the reindexing is done.

The final example shows the all flag being used, and the command returning a list of nodes, identified by ip address, and the reindexing status for each one.

Index_Checks is a pretty interesting command, as it returns statistics and diagnostic information about a search index. The may help to answer certain burning questions like whether data for a particular search index and node has been reindexed before, or how many documents are currently indexed locally for a search index. However, most of the statistics provided may not be particularly useful for anyone but an advanced user of DSE Search or Solr.

The example here shows the index_checks command running on the killrvideo.videos search index, with the following example output.

Here the doc statistics show current number of documents indexed for the search index on the node, all-time max number of documents, and count of deleted documents.

The version starts out at 2 if there is existing data has not been indexed yet. However the version number will increase while indexing, either from reindexing or with CQL writes. This value is not synced across Search nodes so eventually you'll start see different versions on different nodes.

Finally the last statistic that I'll describe here is the current, which, will be false while reindexing is taking place or has failed, and true when finished.

The list_index_files command is a handy way to see the current index segment files written to disk for a search index, without having to navigate through the file system to see. In addition the command will also display the disk usage and encryption statistics for each of the index segment files as well.

Last but not least is the perf command, which actually has quite a lot of sub commands and options. I'm going to defer explaining this command for now, and do so instead in a later video.

We're finally done going over all of the great, useful dsetool commands. How much of them do you remember right now? Don't worry, go do this exercise and you'll definitely be able to remember more of dsetool.

And now it's time for an exercise!

Ok, I'm done talking. Do this exercise while I catch my breath.

No write up.
No Exercises.
No FAQs.
No resources.
Comments are closed.