Solr and Lucene Fundamentals

DSE Version: 6.0

Video

Transcript: 

Let’s look at some of the fundamentals of Solr/Lucene and how they map to Cassandra concepts which enable. DSE Search.

Core == Table

  • What we call the search index in DataStax Enterprise.

  • Indexes are created for a specific DSE table and identified by keyspace.table, for example, killrvideo.videos

  • Indexes are defined and configured with two resource files: schema.xml, solrconfig.xml.

Document == Row

  • This is a 1:1 mapping.

Field == Column

  • Normally you define some, or all, of your CQL table columns to be fields.

  • It’s actually possible to also define fields in your search index that don't have a corresponding CQL table column. This is used to duplicate data in your index for different kind of analysis but avoids storing the raw data multiple times.

  • Every field has a type which represents the type of data being indexed or stored.

Shard == token range

  • The index is split up and stored on different nodes as shards which correlate to your data store data.

  • Data is indexed locally, so shards map to token ranges and replicas for each node.

  • DSE Search determines the minimal set of shards that make up the entire document set for a core.

Index == data structure used in search query.

  • The index is written as data is inserted, updated or deleted in DSE

  • Has similar properties to SSTables: A core's index is made up of one or more index segments, New and updated index entries are written to a new segment

  • Similar disk operations - commit (flush) and merge (compact)

  • Index segment is not available to read until the segment is committed

Term ==

  • Your text search input and Tokenized data, or words, as a result of text analysis

Phrase ==

  • Consists of multiple terms where the position of each term in the phrase is important.

  • Phrases will need to be enclosed in double-quotes

Solr_query

  • solr_query is a psuedo-column that is automatically added to a table with a search index. This acts as the passthrough column into Solr/Lucene and is used to explicitly run a search query with DSE Search. Therefore, it uses the Solr query syntax. Historically needed for any text search powered by DSE Search.

  • The results still return as CQL rows . Ordered by default based on each row's relevancy to the search query. Results can also be sorted by a specific field, or set of fields

No write up.
No Exercises.
No FAQs.
No resources.
Comments are closed.