DSE Version: 6.0


User Defined Functions and User Defined Aggregates are a little bit complex. We tend to use these in conjunction with one another to get the functionality we need in our table. Learn more about UDFs and UDAs in this unit.


Alright, let’s get into something a little more complex here, User Defined Functions and User Defined Aggregates.  The reason that these are lumped together is because you will likely use them in conjunction to get the functionality you need in your table.  

Let’s break it down into pieces, starting with User Defined Functions.  What is a UDF? It is the ability for users like you to create your own functions in JAVA or JavaScript.  These functions can then be used in SELECT, INSERT and UPDATE statements. Functions that you create are only available in the keyspace in which you created it.  

There is a little set up that needs to be done before you create your UDF. You have to first jump over to your .yaml file and enable a couple things. If you plan to use JAVA, set  enable_user_defined_functions to true and for Javascript and other custom languages: Set enable_scripted_user_defined_functions  to true.

Okay, so now what is a User Defined Aggregate?  Well, a UDA is where you use the user defined functions you already created in some interesting or creative way to get the results you need from your cluster.  These functions and the user defined aggregate must be created before you use them in a query. That was probably obvious, huh? Anyway, one more thing: The query must only include the aggregate function itself—no additional columns.

Let’s see a real life example in action!

In order to create our user-defined aggregate, we have to create some new functions: there's the required one called the state function, which is executed and processes the value in every row. There is also an optional final function that is executed once all the rows have been processes, which we'll need to create our average.

For an average, the simplest way to do so would be to take sum of all the values and then divide by the count of the values. Our state function therefore will be saving the running sum and count for the rows being aggregated. We define a function which will have two arguments, a tuple that contains the running aggregates, and a float that is the column value of the current row. The tuple has two fields being int and float, with int representing the running count, and the float representing the sum.  The return value will be the tuple with the running aggregates.

The state function will then increment the int field in the tuple by one, and add the column value to the float field, so long as that value is null.

The final function uses the tuple as an argument and does the processing to calculate the average. If the count, saved in the int field in the tuple, is 0, the function will return null. Otherwise, the function will get the average by dividing the sum of the values, represented by the float field in the tuple, by the count of the number of valid rows, which is represented by the int field in the tuple.

With our two functions we can now create our aggregate, defining the data type that will be returned by the aggregate as float. It uses our state function as the SFUNC, the tuple as the STYPE, or the CQL type of the parameter returned by the state function, and the final function as FINALFUNC. We also set the initial values for our STYPE to be 0 for the count and 0 for the sum.

Okay, once the two functions and the user defined aggregate are created, you can now query them.  Take a look at the bottom of this slide for for some example select statements using our newly created UDA.  A couple things to note: First, the state function is called once for each row and the value returned by the state function becomes the new state.  After all rows are processed, the optional final function is executed with the last state value as its argument. Finally, the aggregation itself is performed by the coordinator.

Phew!  You might want to rewatch the example slides a couple of times to make sure it sunk in!

No write up.
No Exercises.
No FAQs.
No resources.
Comments are closed.