Recursive Traversals

DSE Version: 6.0

Video

A recursive traversal is a traversal that contains one or more internal, nested traversals that can be repeatedly evaluated again and again until a certain stopping criterion is met. In this unit, you will learn about recursive traversals.

Transcript: 

Hi, I am Artem Chebotko and this is Recursive Traversals.

A recursive traversal is a traversal that contains one or more internal, nested traversals that can be repeatedly evaluated again and again until a certain stopping criterion is met.

For example, in some cases, it is useful to be able to instruct Gremlin to hop between vertices in your graph until the traverser finds a vertex with interesting property values.

Recursion in Gremlin is supported by the repeat step.

“repeat” takes a traversal as a parameter and evaluates it again and again. To stop repetition, we also need to use step modulators, such as times and until.

“times” specifies the exact number of iterations we want.

“until” specifies a stopping condition, in the form of either an internal traversal that should yield some result or a predicate that should evaluate to true for the recursion to break.

Another useful step modulator is “emit”. It has nothing to do with stopping recursion however.

Sometimes when you have repetition, you want not only the result of the last iteration but also all or some intermediate iteration results. That is what “emit” is for … you can send copies of traversers from intermediate iterations past repetition. This can be done unconditionally or based on a condition defined by either a traversal or a predicate.

We will soon see all of these constructs in concrete examples. Please hold your questions.

Here is our KillrVideo graph schema.

We have four types of vertices ... movies, users, genres and people. They can be related to each other by different types of relationships.

Users may know each other and can rate movies. Movies belong to genres. Movies can have actors, directors, composers, screenwriters, and cinematographers.

And of course we have various vertex properties and one edge property in this schema. User-defined vertex IDs are underlined. We have no meta-properties but we do have one multi-property … that is the production property.

Please take a moment to study this schema as it will help you to follow our traversal examples.

Specifically, in this presentation, our focus will be on users and knows-edges. Those form a social subgraph that we will traverse recursively.

This is more of a toy example but it will help us to understand how repeat and times play together.

We have a simple task of counting how many other users a given user knows. We can, of course, find an answer by traversing "knows" edges, which can be done using a single out("knows") step.

And this is exactly what we do in our first non-recursive traversal. It  returns 4 immediate friends. BTW, we chose to use out instead of both in this example just to keep things simple.

The next two traversals are recursive traversals. The difference between them is that times is specified after or before repeat, which affects the final counts.

repeat.times results in the do while semantics, meaning that the internal traversal will be evaluated before checking the stopping condition. In the example, even though times(0) sets the number of repetitions to zero, one out step is evaluated to get to 4 friend vertices.

times.repeat will first evaluate the stopping condition and only then do the repetition, similarly to a while do loop. In the example, times(0) instructs Gremlin to never perform repeat, and thus the traverser stays at the vertex representing user "u1". Counting this single vertex returns 1.

The last two recursive traversals give identical results and you should be able to figure out why.

Now let’s dig a little deeper into “u1”’s social network and count friends of friends of friends of friends of friends of this user. In other words, we are going to count users who are 5 out steps away from user u1. This becomes a truly recursive traversal.

We get the same result of 715 whether we use repeat.times or times.repeat.

And since the same users might have been found multiple times by traversing different paths in our little social network, we show how to use dedup() to remove duplicates. We get 397 distinct users.

In this next example, we want to find connections of our user u1 who are 17 years old.

Since we do not really know how many repetitions we may need (could be 2, 100, or some other arbitrary number), we cannot use the times modulator to define a stopping condition.

Instead, we are going to use the modulator “until” and the new “limit” step.

In our first traversal, we start with user u1 and keep traversing knows-edges in any direction until we see a user-vertex with age 17. We only need to find 3 such users because the limit step caps the number of results to 3. For each found user, we get a value map and indeed ... we can see that all users in the output are 17 years old.

Something that may not be very obvious is that if we remove the “limit” step from this traversal, it will become an infinite traversal .... because after each iteration there will always be some traversers that moved to user-vertices with age other than 17.

Furthermore, even with the limit step, our traversal can still run forever if it cannot find at least 3 results. It will keep iterating and looking for users but our graph may simply not have enough results to satisfy the limit.

Just think about it for now and we will soon learn about another step that will help us to deal with those extremes.

In the second traversal, we put the until modulator before the repeat step and we only get one user in the result. But notice that the user in the output is user u1.

What happened here is that because user u1 is 17 years old, the until condition was satisfied and the repeat step was never even evaluated.

So we just introduced the limit step as a way to cap the number of traversers that can move onto the next step in the traversal.

We can also use the range step to set the low and high inclusive range for a number of traversers that can pass to the next step in the traversal.

The tail step is analogous to limit, except that tail emits the last n objects instead of the first n objects.

Both limit and range are helpful in preventing an infinite recursion but tail is not … because it requires the result to be calculated completely so that it can take something from the end of that result.

The ultimate step that guarantees that your recursive traversal will not run forever is timeLimit.

The timeLimit step can also be used together with the other steps on this slide … as soon as a traversal runs for a specified amount of time or yields a certain number of object in the result, computation stops.

Let’s look at some examples.

Here we are going to start again with user u1 and repeat for at most 2000 ms the traversal that looks for a user who has more than 20 knows-edges. That is a user with more than 20 direct connections and who is also connected directly or indirectly to user u1.

Next we dedup our result and output a value map … only to find a very well connected user u874.

Increasing the time limit may yield more results.

Reducing the time limit may yield no results at all.

And removing the time limit altogether will result in infinite recursion.

Remember that timeLimit is your best friend when it comes to recursion.

Now let’s talk about something slightly different.

If we want to return a result of each iteration of the repeat step, we can use “emit”.

This example will help us to understand how the emit step modulator works.

The first traversal starts with user u1, traverses outgoing knows-edges one time and counts the adjacent vertices. We see that user u1 directly knows 4 other users.

The second traversal does two iterations and effectively counts friends of friends of u1. Based on the output, user u1 has 13 second-degree friends.

So far we did not see anything new … the next two traversals will fix that.

What if we wanted to count both friends and friends of friends at the same time?

We will have to send the results of the first and second iterations to the count step.

And this is what the third traversal does using the “emit” modulator after “repeat”. So there will be 4 1st-degree friends and 13 2nd-friends counted in this traversal resulting in 17 total.

In our final traversal, putting “emit” before the repeat step, also counts the user u1 himself. In this case, we emit user u1 and his 4 friends before the first and second iterations start. The repeat step also outputs 13 2nd-degree friends ... and so our total count becomes 18.

What is we want to know if two users are connected? Say if user u1 is connected to user u2.

You are going to have to look for an answer recursively because you do not know beforehand what that path might be or if it even exists.

We start with user u1 and repeatedly traverse knows-edges in any direction. We have a time limit of 200 ms and this is important because we do not want to have a potentially infinite recursion.

After each iteration, we only emit a vertex if it belongs to user with id "u2". Based on the sample output, there exist multiple paths between users "u1" and "u2".

Now it is your turn to roll up your sleeves and get your hands dirty!

No write up.
No Exercises.
No FAQs.
No resources.
Comments are closed.