Search Indexes

DSE Version: 6.0

Video

Transcript: 

Now that we know a little bit about what the DSE Search index does, tokenizing descriptive values, applying complex query predicates, and producing a set of row identifiers (primary key values) that are returned to DSE Core for further processing, we enter a brief discussion about of number of further DSE Search ( metadata objects).

Regardless of the number of columns in a DSE table, that have DSE Search index ability placed upon them, the DSE Search index is a table wide object. That is; given a DSE table with 10 columns, and DSE Search index ability added to only two columns, there is still only one DSE Search index on said table.

As such, the DSE Search index name, is the same as the DSE table name, upon which the DSE Search index was created.

All DSE tables, and the DSE Search index placed on each, are uniquely named by the combination of keyspace name, and table name.

While there is no technical limit to the number of DSE Search indexes created, or the number of columns given DSE Search ability, there is a cost to indexing.

All indexes are, in effect, vertical slices of the tables they support; the index contains all rows, and a subset of columns (the indexed columns).  Thus, inserting, updating, or deleting from a table with many indexed columns, incurs the additional operational load of not only changing the row proper, but also any index entries.

As such, only index columns that you know are referenced frequently by query predicates.

DSE Search indexes can be created and managed using CQL, and/or the dsetool command line utility.

DSE Search uses Apache Lucene libraries to generate and maintain index data files.

Similar to DSE Core, these ( index ) data files are immutable, appended to, and compacted on a scheduled basis. Index data is stored one index per directory, in the parent directory specified in dse.yaml with the setting, solr_data_dir.

DSE Search index data is updated automatically as the DSE table proper is updated. To ensure that these updates occur, a specially designated commit log is assigned to each DSE Search index.

Lastly, DSE Search index data is co-located, on the same node, with the data in the DSE table proper that the index data refers to.

When you call to create a DSE Search index, you specify a number of ( index configuration details ). These details are organized into two groups; schema, and config (configuration).

Schema is what you might expect; column names, column types, the treatment a column value receives as it is indexed, other.

Config offers a number of settings related to memory use, other.

Each of these two groups of ( metadata ) are stored in the solr_admin.solr_resources system table as blobs.

Each of these two groups, schema and config, are associated with distinct CQL commands (commands for just schema, and then just config), as well as distinct, legacy Apache Lucene XML encoded data files (one for just schema, and one for just config).

Ultimately, you may need to execute one CQL CREATE SEARCH INDEX statement, followed by a number of CQL ALTER SEARCH INDEX statements, to get exactly the DSE Search index ability your end user application requires.

As such, the metadata you define is first PENDING, received by DSE and in the DSE system catalogs, then ACTIVE after you deploy ( build ) said DSE Search index.

DSE Search indexes are always available, that is; after you create (build) the initial DSE Search index, any subsequent DSE Search index ( re-creation, re-configuring ) is done in full multi-user mode, and the new index, once complete, is made ACTIVE with no interruption in service or processing.

Once any changes to the DSE Search index are made ACTIVE, both the PENDING and ACTIVE metadata will be the same, until the next change in DSE Search metadata.  At that time, any DSE Search index changes will appear as only PENDING, and not ACTIVE.

Consider these are all design phase activities; once a DSE Search index is in place, you may need to change it only when your end user application requirements change.

And then the last of the DSE Search index metadata objects; resource files, also known as resources.

DSE Search indexes have the ability to remove stop words. Words like, A, And, The, other. DSE Search indexes can also support synonyms; mobile phone, cell phone, mobile, .. movie, moving picture, picture show, .. Elizabeth, Liz, Beth, Tina.  And more.

By convention, these lists of data are submitted to DSE Search using the dsetool command line utility, and are stored in the solr_admin.solr_resources table.

These resource files are later referred to by an identifier specified at time of submission, are table wide in scope, and are dropped when the DSE Search index is dropped. Versioning (updates) are performed by uploading a new version of said resource file, and then calling to ( re-build ) the index as over viewed previously when we discussed ACTIVE and PENDING.

No write up.
No Exercises.
No FAQs.
No resources.
Comments are closed.