Field Input Transformer

DSE Version: 6.0

Video

Transcript: 

Hey there, I'm Joe Chu and in this video we'll be talking about the Field Input Transformer.

One of the great things about DataStax Enterprise is that it can store both structured and unstructured data. Searching is fine for structured data, but when you have something unstructured or semi-structured data, how is DSE Search suppose to index search that? One that can help with the is the Field Input Transformer, or FIT. This is a function that exists in the DSE Search write path and can be overwritten with a new field input transformer that you can program and install. The field input transformer will generally take input data and manipulate it in some way before it gets indexed.

Some uses that you can consider for using with the field input transformer includes, mapping data from one CQL column to multiple search tables. This is especially valuable for columns that hold JSON data for example. With the FIT it is possible to extract key values from a JSON string and then add and index the key values in their own fields for searching individually. This also makes it a viable alternative to using dynamic fields, which can be challenging to work with.

Another common use case would be for converting or transforming data prior to indexing. This could be something like transforming values into a format more suitable for searching, or even rewriting the data using a different type of encoding, such as in the case for text, which can be written in many different encoding standards.

Basically, you can do anything you want with the data, as long as you can write the code for it.

To write the program you want to use as the field input transformer, you'll need to include some libraries in the classpath for your application, which would be the dse-search jar, and the solr uber with auth jar. There are different version numbers for these two jars, depending on the version of DataStax Enterprise you are using. Generally they can be copied from the solr library path in your DSE installation, which would be /lib for tarball installations. For package installs you can find the two jar files in /usr/share/dse and /usr/share/dse/solr/lib. Unfortunately these jar files are not available in a public maven repository, so you'll need to copy them over to your local .m2 repository and add the dependencies accordingly in your pom.xml

To implement the new field input transformer, you will need to extend the class com.datastax.bdp.search.solr.FieldinputTransformer and override two methods: evaluate and the addFieldDocument.

Evaluate is the function used to find the field whose data you want to transform, which you can do a comparison for with the field.equals method. When there input data for that field, that will trigger the addfielddocument method.

Addfielddocument is where you write the code that does the transformation magic.

Here is some sample code for a field input transformer that converts data from a column called money_json, and extracts values from that data to be indexed for three separate fields: budget, opening_weekend, and worldwide_gross. You can see that the evaluate method is overridden to return a field whose name equals to money_json.  The addFieldToDocument method does several things. It reads the value from the input field, instanstiates objects that describe our three output fields, and then extracts and saves the appropriate values into those field as strings. Note that there are POJOs use here that are defined separately and not shown. We also make use of the Jackson Object Mapper to help extract JSON fields, but you do not need to use this if you have other ways to do so.

With the code written, the new field input transformer needs to be compiled and exported as a jar file, which then needs to be installed in DSE. It should be saved in the DSE lib location, which is /usr/share/dse/solr/lib for the DSE package install, and /resources/solr/lib for the tarball install.

The example below shows some possible commands to compile the FIT, export to a jar file, and then copy to the Solr library.

If you are transforming data and saving it somewhere other than the original field, then make sure that you actually declare those fields in your search index schema. Here we show the field declarations for the three fields used in our field input transformer.

There is also a configuration change needed for the search index to make use of your field input transformer. Add a fieldINputTransformer element with the name specifically set to dse, and set the class attribute to the name of your class that overrides the field input transformer. Once you're done, reload the search index, and rebuild if the schema was also modified.

You're done! Everything else now should just happen automatically. However, you might want to see an example of what it does. Here is an insert statement we can run that will add a row with a JSON string added to the money_json column. The json string contains the keys budget, opening_weekend, and worldwide_gross.

Although we won't be able to actually see the values after the field input transformer changes the data, we should be able to search on that data in the corresponding fields. Here are some search queries that searches in the new fields using the value for each of the three JSON keys. Although not shown, rest assured that each one of these queries will return the row that we've previously inserted.

No write up.
No Exercises.
No FAQs.
No resources.
Comments are closed.