Projecting Traversals

DSE Version: 6.0

Video

In this unit, you will learn about projecting traversals. A projecting traversal are those steps which use previously labeled traversers in your query and bring them forward to be used again. Projecting traversals generate new traversers that correspond to the previously labeled steps.

Transcript: 

Hey everyone, I am Denise Gosnell and this is projecting traversals.

When we are building gremlin queries, I like to think of it process loosely in the reverse of the traditional “select from where”. In my head, I see gremlin queries as generally “where from select”. Projecting traversals are one of the ways that give us the ability to do those select statements and pick the data we want to see at that point in our query.

A bit more concretely, a projecting traversal are those steps which use previously labeled traversers in your query and bring them forward to be used again. Projecting traversals generate new traversers that correspond to the previously labeled steps.

A projecting traversal can be also simple, branching, recursive, path, and so on.

Let’s visualize this.

In this diagram, we can see six total states for our traversers: a start, a finish, and 4 traverser states in between. The small, colored triangles above the gremlins at the start, and step 3 represent labeled steps. We can imagine that the step at the finish was a projecting step, which specifically asked to bring forward the traversers from the start and step three. We see this as the marked gremlins are visually duplicated and brought forward to the right after the finishing step.

In the gremlin language, we have two examples of projecting traversals. The most common one we use is the select step. The select step is used to grab a previously labeled traversal to make use of it later on. Its parameters are step labels that are previously defined in the query using the step modulator as.

The next step to talk about is where. There are multiple ways to use the step where in the gremlin language, but the use that is most like a projecting step is when we use where to recall a previous step and evaluate some predicate on those traversers. where does not return/project new traversers but instead, can refer to previous labeled steps to evaluate some condition.

We are going to use both select and where in some examples coming up and the difference between them will be much more clear.

There are two step modulators that are frequently used with projecting traversals: as and by. The as step is the way to label a part of your gremlin query. This is the must have step with a projecting traversal.  We have to label something with as inorder to use it again later in the query. In practice, we use the as step to assign a meaningful name to a part of a traversal, to then be used later on.

By is also very useful to use with projecting traversals. It is similar to saying that we now have some object, and we want to use it in a particular way. The by step modulator is frequently used in projecting traversals to specify what to project, such as specific element property value, id, or even an internal traversal result.

Both the as ad by step modulators are not specific to projecting traversals and can used with many other steps and traversals, but we are going to walk through some examples that show how to use them for projecting traversals.

A reminder as always – of our KillrVideo movie schema.

Movies are related to people. And those people are those responsible for creating the movie. Users rate movies and know eachother, and movies belong to genres.

So, let’s dig in and go through an example. Let’s say that we want to find the movies that Johnny Depp has acted in, and we want to create a payload that has the keys: actor and movie.

To do that, remember that we are loosely going in reverse from traditional querying. Here, we want to follow a traversal pattern of “where, from, select”.

To do this we start by finding where are gremlin traversers are going to start their journey: the Johnny Depp person vertex. Once we found where we are starting our traversal, let’s give this step a name by using the “as” step. Here, we labeled the johnny depp person vertex as “actor”.

From our starting point, we need to get to the movie vertices by walking through the edge pointing into the “actor” vertex. Let’s label this step as “movie”.

Lastly, we want to select some data to be returned from our query. Specifically, we want to select the actor, johnny depp, and the movies he has acted in. And, we want to build up a view of the data, so we use the by modulators. You will need to write as many by modulators as items you are selecting. The first by modulator applies to the first data selected – the actor. We can select any property from this vertex, but here we are selecting the actor’s name. The second by modulator applies to the second object in the select step – the movie. We can select any property from the movie vertex, but here we are selecting the movie’s title.

And finally, let’s take a look at only one of those results.

Maybe our application needs more than just the actor’s name and the movie title. So, in this example, we are going to add in the year and the genre.

To do that, remember that we are loosely going in reverse from traditional querying. Here, we want to follow a traversal pattern of “where, from, select”.

Let’s start with where: we need to identify where our traversers are going to start their walk through the graph data. As before, we query for the the johnny depp person vertex and label it as “a” for actor.

From this starting point, we need to walk to movies and genres to get access to the remaining 3 elements for our payload. We walk through the in-edge “actor” to get to the movie vertices. We know the movie vertices have the title and year for the film, so we give these vertices two labels: t for title and y for year.

From the movies, we walk over to the genre vertices using the “belongs to” edge. Let’s label this with a g for genre.

Lastly, we need to use the projecting step, select, to access our labeled data. For each item we select, we also provide a by modulator to indicate exactly which piece of data we want to put into our payload. [DESCRIBE SELECT AND BY STEPS…]

And finally, let’s take a look at two of those results with the sample step.

One last example of the “where from select” pattern.

In, this example, we label the same step three times ("t","y","r"). We use different by modulators for each projected label including the last by modulator with the internal traversal to access movie ratings and compute their average.

The last three examples have demonstrated how we use as, select, and by in projecting traversals. Next, let’s bring the step “where” into our query as a filter.

In this example, we want to find directors who appeared in their own movies.

Following our general “where”, “from”, “select” pattern, we need to figure out where we are starting.

In this traversal, we want at our person vertices. And, we want to only start with those people who both directed and acted in the same move.

Now, this query is going to to have to be run in OLAP mode because it requires a analyzing a full partition of vertices, namely the person vertices.

To do this, we start by labeling all person vertices as “d”, for director. But, we have all persons, not just directors. With this data model, we have to walk to the movie vertex through the “director” edge to filter out all persons who are not directors. We eventually will need data from the movie vertex, so while we are here, let’s label this step with “t” and “y” since we know we will be getting the title and year from the movie vertex later on.  Next, we will walk back to the person vertices through the actor edge and label these traversers as “a” for actor.

Now here is the trick. We want to select the persons from the first step, those traversers were labeled “d”, and only let those traversers live on if their corresponding actor is the same. We do this by selecting “d”, and apply a filter with the where statement. Specifically, we allow all “d” persons to pass our where filter only if they are the same person as the actor “a”. In other words, if we have a director who was also an actor in the movie, we would like those traversers to pass the filter.

The next select builds up the payload we want with the remaining values from the traversal: the directors, title and year of the movie.

And hopefully this was one of the examples you imagined would come up for this – Quentin Tarantino from Pulp Fiction in 1994.

Last example here, to show you another way to use the select… where pattern. We are going to stick with the same concept: finding the directors who appeard in their own movies. But this time, make use of a different form of the where pattern.

The query is the exact same as before, with one change: We eliminate the first select and only use where and the second select. This is possible with the additional parameter for where that specifies the labeled step "d".

Both solutions are semantically equivalent. It is a matter of which syntax seems more readable to you.

Projecting traversals are one of our most important tools in our gremlin tool box, and can be a good starting point for those who are comfortable with the “where”, “from”, “select” thinking pattern.

Now it is your turn to give this a try. Give this next exercise a go and get your hands dirty with projecting traversals.  

No write up.
No Exercises.
No FAQs.
No resources.
Comments are closed.