Gremlin Language Introduction

DSE Version: 6.0

Video

Gremlin is a graph traversal language, defined by the Apache Tinkerpop Project. It is the language used in DataStax Enterprise Graph, and in manhy other graph databases. In this unit, we will introduce you to Gremlin.

Transcript: 

Hi, I am Artem Chebotko and this is Gremlin Language Introduction.

Gremlin is a graph traversal language defined by the Apache TinkerPop project.

It is the language we use in DataStax Enterprise Graph but you will also find it in many other graph databases.

Gremlin is a very expressive and functional language with a fluent syntax, which makes it easy to use and learn.

It has bindings in many programming languages and we will be using Gremlin-Groovy for all our examples.

Let’s take a look at an example of a relatively simple traversal with what we call a linear motif.

Our traversal starts with a couple of constructs. “g” is called a traversal source and it knows about the graph we want to traverse and about the traversal engine we want to use. “V” is a method that defines all of the vertices in the graph.

Starting with all the vertices, we are defining a sequence of steps that are going to be executed by traversers. The illustration of a small graph with one movie-vertex and three person-vertices will help us to visualize how a traverser (this green Gremlin creature) moves in the graph.

Let’s look at those steps more closely.

The first is ... we want only vertices that have the title Alice in Wonderland. That is the line that says “has(“title”,”Alice in Wonderland”)”.

The second step is also “has” and we use it to only find those vertices where “year” is 2010. Our Gremlin should end up on the vertex with title Alice in Wonderland and year 2010.

Next, step called “out” instructs the traverser to traverse any outgoing edge with label “director”. The traverser should end up on the person-vertex as shown in the illustration.

Finally, the “values” step tells the traverser to get to the name-property of that vertex and return its value. And as you can see the result is Tim Burton.

This is a wordy explanation of a really simple question of “who directed the 2010 Alice in Wonderla nd?“. The answer is “Tim Burton”. But it is a good way to get an idea of what a linear traversal motif of Gremlin looks like.

Here is an example of a traversal with what we call a nested motif. Traversal steps are not just a sequence. They form a tree with branches. More specifically, the union step in this example has two nested traversals with one out step each.

As you can see in the illustration, we are still looking for the 2010 Alice in Wonderland movie but then want to traverse outgoing edges with labels “director” and “screenwriter” and take a union of all vertices that we find. Finally we are interested in the names of all directors and screenwriters of the 2010 Alice in Wonderland movie.

The result is Tim Burton, who is a director, and Linda Woolverton, who is a screenwriter.

The main ingredients that we see in a Gremlin traversal are traversal source, traversal steps, and traverser.

In this example, traversal source is created for a property graph object using the standard, OLTP traversal engine. In DataStax Enterprise Graph, both OLTP and OLAP traversal sources for each graph are already predefined and available to you. However you may define a new traversal source for a subgraph that you may extract dynamically. The first line of code shows how to do so.

Traversal steps define instructions for the traversal and we will be able to learn a couple of dozens of them.

Traverser is an internal object propagating through the traversal to unify the graph and the traversal steps. In other words, what it does is finds a solution or a traversal result.

To become proficient in Gremlin, we can study individual traversal steps. There are a few dozens of them.

Lambda steps are the foundational constructs of the Gremlin language. They are most general steps that can be used to implement most other steps. However they are not the easiest traversal steps to use and usually the least efficient because they are hard to optimize. As a result, lambda steps are disabled by default in DataStax Enterprise Graph. Later we can take a closer look at these steps mostly for educational purposes.

Derived steps, on the other hand, are useful in practice. Most traversals will typically rely on around 10 most commonly used steps, so learning all of them is not necessary to start using Gremlin.

There are also a few steps that do not fall under the lambda/derived classification.

There are step modulators that are used to manipulate or pass additional parameters to a traversal step.

Finally, there are predicates that are used to relate values and objects.

Gremlin is defined as a lazily evaluated stream processing language. This means that as a traversal gets evaluated, work may begin on one step of the traversal and if possible, even before that step is complete, work can begin on the next step. It is not necessary to wait for all preceding traversal steps to complete to begin the next one.

Lazy evaluation is not always possible or practical. When so called barrier steps are used, all previous steps in a traversal must complete before a barrier step is evaluated.

For example, to order objects in a stream, we have to collect all the objects first … therefore “order” is a collecting barrier step.

Similarly, to reduce a stream to a single number representing the count of all objects in the stream, we need to have all of the objects available to us first … therefore “count” is a reducing barrier step.

There is also “cap”, the supplying barrier step, that can be used when previous steps have side effects and we want to supply the fully-computed side effect to the stream. An example of a side effect would be a subgraph that we may extract from a large graph.

Finally, combining individual steps into traversals of different types is another good approach to learn the language. We will take this approach for this training. We will look at simple, branching, recursive, path, projecting, declarative, subgraph, statistical, and many other useful traversals.  

No write up.
No Exercises.
No FAQs.
No resources.
Comments are closed.