Last update Jul 23, 2019

Inside Sequences: Create Your Own Sequence Operations

Sequences are a fantastic way to process collections of data in a way that can perform better than the standard collection operations, as we saw in the previous article, Kotlin Sequences: An Illustrated Guide. Today, we’re going even deeper! We’re going to look under the hood - inside Kotlin sequences - in order to understand how they work like never before!

In fact, by the end of this article, we’ll even create our very own Sequence class and plug it into an operation chain!

If you’re new to sequences, you’ll definitely want to check out that previous article first, as it explains what sequences are, and provides some simple illustrations and diagrams to get you up to speed quickly.

Ready to go? All right, let’s do this!

Sequences, Iterables, and Iterators

As you recall from the previous article, the magic that can give sequences a performance edge over normal collection operations is the order in which the individual operations are applied.

In collection operation chains, each operation is applied to all elements before moving onto the next operation. In other words, one operation at a time.
In sequence operation chains, each element receives all applicable operations before moving onto the next element. In other words, one element at a time.

Here’s the diagram from last time, which shows this difference visually:

Diagrams comparing the order of operations for a collection operation chain versus a sequence operation chain.

The key takeaway here is that Sequences work on one item at a time.

One item at a time…

That sure sounds a lot like an Iterable, doesn’t it? And in fact, if you were to look at the definitions of Iterable and Sequence side-by-side, you would hardly notice a difference!

public interface Iterable<out T> {
    public operator fun iterator(): Iterator<T>
}

public interface Sequence<out T> {
    public operator fun iterator(): Iterator<T>
}

That’s right! Other than their names, their definitions are identical! They both simply contain an iterator() function, which, when paired with the operator keyword, makes it so that instances of these types can be used as the subject of a for loop.

So, Iterable and Sequence look quite similar.

But…

Whereas subtypes of Iterable are often named for the way the items are structured (e.g., an ArrayList is backed by an array) or the way the items relate to each other (e.g., a List is ordered, items in a Set are distinct, etc.), subtypes of Sequence are named for the operations that they perform on those items.

For example, in order to perform a filter operation, there’s a sequence class called FilteringSequence. And in order to perform a map operation, there’s a sequence class called TransformingSequence.

UML class diagram showing the difference in naming between subtypes of Iterable and subtypes of Sequence.

By wrapping one kind of Sequence instance in another, we can compose operations that can be applied to each item of the Iterator in turn.

For example, in the last article, we modeled a sequence that processed crayons. Here’s what it looked like:

val fireSet = BigBoxOfCrayons
    .asSequence()
    .filter { it.color in includedColors }
    .map { it.copy(label = "New!") }
    .take(5)
    .toSet()

As we call each operation like filter(), map, and take(), we’re really just wrapping one sequence instance in another. To help visualize this, let’s walk through the operation chain one step at a time, and see what it looks like!

First, we call asSequence, which creates a sequence by using BigBoxOfCrayons’s iterator as the source data. This gives us a single Sequence instance:

val fireSet = BigBoxOfCrayons
    .asSequence()                         
    .filter { it.color in includedColors }
    .map { it.copy(label = "New!") }
    .take(5)

Next, we call .filter(). This wraps our last Sequence with a new instance of FilteringSequence:

val fireSet = BigBoxOfCrayons
    .asSequence()
    .filter { it.color in includedColors }
    .map { it.copy(label = "New!") }
    .take(5)

Concentric circles with FilteringSequence wrapping Sequence.

Then, we call .map(). This wraps those two sequences in a third, called TransformingSequence:

val fireSet = BigBoxOfCrayons
    .asSequence()
    .filter { it.color in includedColors }
    .map { it.copy(label = "New!") }      
    .take(5)

Concentric circles with TransformingSequence wrapping FilteringSequence wrapping Sequence.

Finally, we call take(), which wraps those three sequence instances in a fourth, called TakeSequence:

val fireSet = BigBoxOfCrayons
    .asSequence()
    .filter { it.color in includedColors }
    .map { it.copy(label = "New!") }
    .take(5)

Concentric circles with TakeSequence wrapping TransformingSequence wrapping FilteringSequence wrapping Sequence.

As you can see, every time we call a new operation, all we’re really doing is wrapping the previous sequence in another sequence!

In fact, the exact same operation chain above could be built by just calling the constructors for each sequence subtype instead…

TakeSequence(
    TransformingSequence(
        FilteringSequence(
            BigBoxOfCrayons.asSequence(),
            true,
            { it.color in includedColors }
        ),
        { it.copy(label = "New!") }
    ),
    5
)

… except that these classes are all marked internal in the standard library, so you can’t actually call their constructors. But c’mon, who wants to call a bunch of constructors when that dot notation (.filter { }) looks sooo much nicer?!

Now, if each operation is just wrapping more and more sequence instances, when do the operations actually execute? In other words, when does the filtering, mapping, and taking actually happen?

Intermediate and Terminal Operations

Sequence operation chains have two phases:

The building phase, where the sequence is built by wrapping a sequence in another sequence, probably multiple times.
The execution phase, where the iterator() of the Sequence is finally traversed.

If the building phase is like setting up a row of standing dominoes, then the execution phase is where you tip the first one over and watch it all play out.

Let’s look at our code from earlier, and see which calls are a part of which phase:

val fireSet = BigBoxOfCrayons
    .asSequence()                            // Building...
    .filter { it.color in includedColors }   // Building...
    .map { it.copy(label = "New!") }         // Building...
    .take(5)                                 // Building...
    .toSet()                                 // Executing!

In this code, the building phase consists of the first four operations that we perform the BigBoxOfCrayons - asSequence(), filter(), map(), and take(). We’ve wrapped the sequence a few times, but haven’t actually looped over the items yet.

Then, when we call toSet(), the iterator is finally kicked off, and the whole chain of operations gets going!

The functions that build out the sequence chain are called intermediate operations.
The function that executes the sequence chain is called a terminal operation.

As the name implies, once you invoke a terminal operation, you can’t build that sequence any further - that operation chain is done.

It’s worth noting, though, that in many cases you can store off the Sequence to a variable, and fire off multiple terminal operations on it.

    val fireSequence = BigBoxOfCrayons
        .asSequence()                            // Building...
        .filter { it.color in includedColors }   // Building...
        .map { it.copy(label = "New!") }         // Building...
        .take(5)                                 // Building...

    val fireSet = fireSequence.toSet()           // Executing!
    val crayonCount = fireSequence.count()       // Executing!

How can you tell the difference between an intermediate and terminal operation?

As a general rule, you can recognize an intermediate operation because it returns another Sequence.

The terminal operations, on the other hand, generally return something that’s not a Sequence, such as:

Some other kind of collection, like a List or Set.
A single value from the sequence, such as the first String from a Sequence<String>.
A primitive class like Boolean, Int, Double, and so on. Often these are the result of an aggregate operation, like sum() or average().

The distinction between intermediate and terminal operations is important to keep in mind, because it can mean the difference between whether your code executes or not. For example, what will the following code print?

sequenceOf("Hello", "Kotlin", "World")
    .onEach { println("I spy: $it") }

Nothing at all!

Why?

Because onEach() is an intermediate operation. Since we never finished it off with a terminal operation, the iterator was never traversed, and the println() function was never invoked!

To get it to actually call println(), we’d have to cap it off with a terminal operation, such as toSet():

sequenceOf("Hello", "Kotlin", "World")
    .onEach { println("I spy: $it") }
    .toSet()

But more likely, you didn’t want to use the intermediate operation onEach(); you wanted to use the terminal operation forEach():

sequenceOf("Hello", "Kotlin", "World")
    .forEach { println("I spy: $it") }

Creating Our Own Sequence

Now that we’ve got a good grasp on what sequence classes are, and the difference between intermediate and terminal operations, we can use that knowledge to create our very own sequence operation!

Instead of using onEach { println("I spy: $it") } as we did above, let’s create our very own custom sequence operation that can print out the element for us!

When we’re done, we’ll be able to write this:

sequenceOf("Hello", "Kotlin", "World")
    .spy()                            
    .toSet()

… which will print each item to the console.

Step 1: Implementing the `Sequence` Interface

Naturally, if we want to make our own sequence, we’ll need to start with a class that implements the Sequence interface. In order to do that, we also need to include the iterator() function. As a placeholder for the implementation of that function, let’s start with just TODO():

class SpyingSequence<T> : Sequence<T> {
    override fun iterator() = TODO()
}

Step 2: Wrapping Another Sequence

Now, we know that sequence classes wrap other sequences, so let’s go ahead and pass the underlying sequence into the constructor. Remember - the underlying sequence is just the previous sequence in the operation chain.

class SpyingSequence<T>(private val underlyingSequence: Sequence<T>) : Sequence<T> {
    override fun iterator() = TODO()
}

When one sequence wraps another, the iterator also has to wrap the underlying sequence’s iterator. As a first pass at that, we can just do some simple delegation like this:

class SpyingSequence<T>(private val underlyingSequence: Sequence<T>) : Sequence<T> {
    override fun iterator() = underlyingSequence.iterator()
}

Step 3: Adding the Functionality

The code above compiles, and would simply serve as a no-op. That is, it’s got “no operation”; it just a passes through to the iterator of the underlying sequence. We want to spy on the elements, so we’ll need to make a custom iterator where we can put our println().

Let’s take a small step in that direction by making a custom Iterator that delegates to the underlying iterator for each of its two functions - hasNext() and next():

class SpyingSequence<T>(private val underlyingSequence: Sequence<T>) : Sequence<T> {
    override fun iterator() = object : Iterator<T> {
        val underlyingIterator = underlyingSequence.iterator()

        override fun hasNext() = underlyingIterator.hasNext()
        override fun next(): T = underlyingIterator.next()
    }
}

We’re almost there! All that’s left to do is to print out the item. We’ll expand the next() function a little bit. Instead of just returning the result of the underlying iterator.next() call, we’ll print out the item first.

class SpyingSequence<T>(private val underlyingSequence: Sequence<T>) : Sequence<T> {
    override fun iterator() = object : Iterator<T> {
        val iterator = underlyingSequence.iterator()

        override fun hasNext() = iterator.hasNext()
        
        override fun next(): T {
            val item = iterator.next()
            println("I spy: $item")
            return item
        }
    }
}

And now our custom Sequence class is done!

Step 4: Creating the Extension Function

But, in order to use it from the middle of a sequence chain, we’ve got one more step: we need to create an extension function!

fun <T> Sequence<T>.spy() = SpyingSequence(this)

We’re done! Now we can include it like any other sequence operation!

val fireSet = BigBoxOfCrayons
    .asSequence()
    .filter { it.color in includedColors }
    .map { it.copy(label = "New!") }
    .spy()  // <---- here's our sequence!   
    .take(5)
    .toSet()

This is, in fact, exactly how sequences are included in Kotlin’s standard library! They’re included by:

Creating a class that implements Sequence, and then…
Creating an extension function that calls that class’ constructor.

There are dozens of intermediate sequence operations included in the standard library - .filter(), .map(), withIndex(), distinct(), and chunked(), just to name a few. So you might expect that this could result in a lot of different Sequence classes. But in fact, just a handful of well-crafted, dynamic Sequence classes power almost all of those different operations!

Here are a few examples. Notice how some of the same sequence classes are used for multiple operations.

Sequence Extension Function	Sequence Class(es)
`.map()`	`TransformingSequence()`
`.mapIndexed()`	`TransformingIndexedSequence()`
`.mapIndexedNotNull()`	`FilteringSequence(TransformingIndexedSequence())`
`.filter()`	`FilteringSequence()`
`.filterNot()`	`FilteringSequence()`
`.filterNotNull()`	`FilteringSequence()`
`.filterIsInstance()`	`FilteringSequence()`
`.filterIndexed()`	`TransformingSequence(FilteringSequence(IndexingSequence()))`

Different extension functions can invoke the constructor of the same sequence class with different arguments to achieve different results, and some extension functions combine multiple sequence classes.

Summary

Cartoon of a British detective from the late 1800s.

Congratulations! You’ve gone way beyond just understanding what Kotlin’s sequences are - you’ve now got a solid understanding of how Kotlin’s sequences work on the inside! You’ve seen how sequence classes wrap other sequence instances, and understand the difference between intermediate and terminal operations.

Before we finish up this series, though, we still need to answer the big question - “how do I know when I should use sequences instead of normal collection operations?” Performance is always a complex issue, and for that reason, we’re going to cover this in the third and final article in this series.

And for more fun, you can also check out the episode of the Talking Kotlin podcast where Hadi Hariri and I discuss sequences!

Thanks to Kartik Gujarati, Mitch Ware, and Krzysztof Zienkiewicz for reviewing this article.