Sequences are a fantastic way to process collections of data in a way that can perform better than the standard collection operations, as we saw in the previous article, Kotlin Sequences: An Illustrated Guide. Today, we’re going even deeper! We’re going to look under the hood - inside Kotlin sequences - in order to understand how they work like never before!
In fact, by the end of this article, we’ll even create our very own Sequence
class and plug it into an operation chain!
If you’re new to sequences, you’ll definitely want to check out that previous article first, as it explains what sequences are, and provides some simple illustrations and diagrams to get you up to speed quickly.
Ready to go? All right, let’s do this!
Sequences, Iterables, and Iterators
As you recall from the previous article, the magic that can give sequences a performance edge over normal collection operations is the order in which the individual operations are applied.
- In collection operation chains, each operation is applied to all elements before moving onto the next operation. In other words, one operation at a time.
- In sequence operation chains, each element receives all applicable operations before moving onto the next element. In other words, one element at a time.
Here’s the diagram from last time, which shows this difference visually:
The key takeaway here is that Sequences work on one item at a time.
One item at a time…
That sure sounds a lot like an Iterable
, doesn’t it? And in fact, if you were to look at the definitions of Iterable
and Sequence
side-by-side, you would hardly notice a difference!
|
|
That’s right! Other than their names, their definitions are identical! They both simply contain an iterator()
function, which, when paired with the operator
keyword, makes it so that instances of these types can be used as the subject of a for
loop.
So, Iterable
and Sequence
look quite similar.
But…
Whereas subtypes of Iterable
are often named for the way the items are structured (e.g., an ArrayList
is backed by an array) or the way the items relate to each other (e.g., a List
is ordered, items in a Set
are distinct, etc.), subtypes of Sequence
are named for the operations that they perform on those items.
For example, in order to perform a filter
operation, there’s a sequence class called FilteringSequence
. And in order to perform a map
operation, there’s a sequence class called TransformingSequence
.
By wrapping one kind of Sequence
instance in another, we can compose operations that can be applied to each item of the Iterator
in turn.
For example, in the last article, we modeled a sequence that processed crayons. Here’s what it looked like:
val fireSet = BigBoxOfCrayons
.asSequence()
.filter { it.color in includedColors }
.map { it.copy(label = "New!") }
.take(5)
.toSet()
As we call each operation like filter()
, map
, and take()
, we’re really just wrapping one sequence instance in another. To help visualize this, let’s walk through the operation chain one step at a time, and see what it looks like!
First, we call asSequence
, which creates a sequence by using BigBoxOfCrayons
’s iterator as the source data. This gives us a single Sequence
instance:
|
Next, we call .filter()
. This wraps our last Sequence
with a new instance of FilteringSequence
:
|
Then, we call .map()
. This wraps those two sequences in a third, called TransformingSequence
:
|
Finally, we call take()
, which wraps those three sequence instances in a fourth, called TakeSequence
:
|
As you can see, every time we call a new operation, all we’re really doing is wrapping the previous sequence in another sequence!
In fact, the exact same operation chain above could be built by just calling the constructors for each sequence subtype instead…
TakeSequence(
TransformingSequence(
FilteringSequence(
BigBoxOfCrayons.asSequence(),
true,
{ it.color in includedColors }
),
{ it.copy(label = "New!") }
),
5
)
… except that these classes are all marked internal
in the standard library, so you can’t actually call their constructors. But c’mon, who wants to call a bunch of constructors when that dot notation (.filter { }
) looks sooo much nicer?!
Now, if each operation is just wrapping more and more sequence instances, when do the operations actually execute? In other words, when does the filtering, mapping, and taking actually happen?
Intermediate and Terminal Operations
Sequence operation chains have two phases:
- The building phase, where the sequence is built by wrapping a sequence in another sequence, probably multiple times.
- The execution phase, where the
iterator()
of theSequence
is finally traversed.
If the building phase is like setting up a row of standing dominoes, then the execution phase is where you tip the first one over and watch it all play out.
Let’s look at our code from earlier, and see which calls are a part of which phase:
val fireSet = BigBoxOfCrayons
.asSequence() // Building...
.filter { it.color in includedColors } // Building...
.map { it.copy(label = "New!") } // Building...
.take(5) // Building...
.toSet() // Executing!
In this code, the building phase consists of the first four operations that we perform the BigBoxOfCrayons
- asSequence()
, filter()
, map()
, and take()
. We’ve wrapped the sequence a few times, but haven’t actually looped over the items yet.
Then, when we call toSet()
, the iterator is finally kicked off, and the whole chain of operations gets going!
- The functions that build out the sequence chain are called intermediate operations.
- The function that executes the sequence chain is called a terminal operation.
As the name implies, once you invoke a terminal operation, you can’t build that sequence any further - that operation chain is done.
It’s worth noting, though, that in many cases you can store off the Sequence
to a variable, and fire off multiple terminal operations on it.
val fireSequence = BigBoxOfCrayons
.asSequence() // Building...
.filter { it.color in includedColors } // Building...
.map { it.copy(label = "New!") } // Building...
.take(5) // Building...
val fireSet = fireSequence.toSet() // Executing!
val crayonCount = fireSequence.count() // Executing!
How can you tell the difference between an intermediate and terminal operation?
As a general rule, you can recognize an intermediate operation because it returns another Sequence
.
The terminal operations, on the other hand, generally return something that’s not a Sequence
, such as:
- Some other kind of collection, like a
List
orSet
. - A single value from the sequence, such as the first
String
from aSequence<String>
. - A primitive class like
Boolean
,Int
,Double
, and so on. Often these are the result of an aggregate operation, likesum()
oraverage()
.
The distinction between intermediate and terminal operations is important to keep in mind, because it can mean the difference between whether your code executes or not. For example, what will the following code print?
sequenceOf("Hello", "Kotlin", "World")
.onEach { println("I spy: $it") }
Nothing at all!
Why?
Because onEach()
is an intermediate operation. Since we never finished it off with a terminal operation, the iterator was never traversed, and the println()
function was never invoked!
To get it to actually call println()
, we’d have to cap it off with a terminal operation, such as toSet()
:
sequenceOf("Hello", "Kotlin", "World")
.onEach { println("I spy: $it") }
.toSet()
But more likely, you didn’t want to use the intermediate operation onEach()
; you wanted to use the terminal operation forEach()
:
sequenceOf("Hello", "Kotlin", "World")
.forEach { println("I spy: $it") }
Creating Our Own Sequence
Now that we’ve got a good grasp on what sequence classes are, and the difference between intermediate and terminal operations, we can use that knowledge to create our very own sequence operation!
Instead of using onEach { println("I spy: $it") }
as we did above, let’s create our very own custom sequence operation that can print out the element for us!
When we’re done, we’ll be able to write this:
sequenceOf("Hello", "Kotlin", "World") .spy() .toSet()
… which will print each item to the console.
Step 1: Implementing the
Sequence
InterfaceNaturally, if we want to make our own sequence, we’ll need to start with a class that implements the
Sequence
interface. In order to do that, we also need to include theiterator()
function. As a placeholder for the implementation of that function, let’s start with justTODO()
:class SpyingSequence<T> : Sequence<T> { override fun iterator() = TODO() }
Step 2: Wrapping Another Sequence
Now, we know that sequence classes wrap other sequences, so let’s go ahead and pass the underlying sequence into the constructor. Remember - the underlying sequence is just the previous sequence in the operation chain.
class SpyingSequence<T>(private val underlyingSequence: Sequence<T>) : Sequence<T> { override fun iterator() = TODO() }
When one sequence wraps another, the iterator also has to wrap the underlying sequence’s iterator. As a first pass at that, we can just do some simple delegation like this:
class SpyingSequence<T>(private val underlyingSequence: Sequence<T>) : Sequence<T> { override fun iterator() = underlyingSequence.iterator() }
Step 3: Adding the Functionality
The code above compiles, and would simply serve as a no-op. That is, it’s got “no operation”; it just a passes through to the iterator of the underlying sequence. We want to spy on the elements, so we’ll need to make a custom iterator where we can put our
println()
.Let’s take a small step in that direction by making a custom
Iterator
that delegates to the underlying iterator for each of its two functions -hasNext()
andnext()
:class SpyingSequence<T>(private val underlyingSequence: Sequence<T>) : Sequence<T> { override fun iterator() = object : Iterator<T> { val underlyingIterator = underlyingSequence.iterator() override fun hasNext() = underlyingIterator.hasNext() override fun next(): T = underlyingIterator.next() } }
We’re almost there! All that’s left to do is to print out the item. We’ll expand the
next()
function a little bit. Instead of just returning the result of the underlyingiterator.next()
call, we’ll print out the item first.class SpyingSequence<T>(private val underlyingSequence: Sequence<T>) : Sequence<T> { override fun iterator() = object : Iterator<T> { val iterator = underlyingSequence.iterator() override fun hasNext() = iterator.hasNext() override fun next(): T { val item = iterator.next() println("I spy: $item") return item } } }
And now our custom
Sequence
class is done!Step 4: Creating the Extension Function
But, in order to use it from the middle of a sequence chain, we’ve got one more step: we need to create an extension function!
fun <T> Sequence<T>.spy() = SpyingSequence(this)
We’re done! Now we can include it like any other sequence operation!
val fireSet = BigBoxOfCrayons .asSequence() .filter { it.color in includedColors } .map { it.copy(label = "New!") } .spy() // <---- here's our sequence! .take(5) .toSet()
This is, in fact, exactly how sequences are included in Kotlin’s standard library! They’re included by:
- Creating a class that implements
Sequence
, and then…- Creating an extension function that calls that class’ constructor.
There are dozens of intermediate sequence operations included in the standard library -
.filter()
,.map()
,withIndex()
,distinct()
, andchunked()
, just to name a few. So you might expect that this could result in a lot of differentSequence
classes. But in fact, just a handful of well-crafted, dynamicSequence
classes power almost all of those different operations!Here are a few examples. Notice how some of the same sequence classes are used for multiple operations.
Sequence Extension Function Sequence Class(es) .map()
TransformingSequence()
.mapIndexed()
TransformingIndexedSequence()
.mapIndexedNotNull()
FilteringSequence(TransformingIndexedSequence())
.filter()
FilteringSequence()
.filterNot()
FilteringSequence()
.filterNotNull()
FilteringSequence()
.filterIsInstance()
FilteringSequence()
.filterIndexed()
TransformingSequence(FilteringSequence(IndexingSequence()))
Different extension functions can invoke the constructor of the same sequence class with different arguments to achieve different results, and some extension functions combine multiple sequence classes.
Summary
Congratulations! You’ve gone way beyond just understanding what Kotlin’s sequences are - you’ve now got a solid understanding of how Kotlin’s sequences work on the inside! You’ve seen how sequence classes wrap other sequence instances, and understand the difference between intermediate and terminal operations.
Before we finish up this series, though, we still need to answer the big question - “how do I know when I should use sequences instead of normal collection operations?” Performance is always a complex issue, and for that reason, we’re going to cover this in the third and final article in this series.
And for more fun, you can also check out the episode of the Talking Kotlin podcast where Hadi Hariri and I discuss sequences!
Thanks to Kartik Gujarati, Mitch Ware, and Krzysztof Zienkiewicz for reviewing this article.