I’ve been learning Rust for the past twenty days or so, working through the Blandy & Orendorff book, coding up things as I go. Once I got into playing with Rust traits and closures and associated types, the similarities to programming in Haskell with typeclasses, data structures, closure passing and associated types was pretty obvious.
As a warm up I thought I’d try porting the stream fusion core from Haskell to Rust. This was code I was working on more than a decade ago. How much of it would work or even make sense in today’s Rust?
Footnote: this is the first code I’ve attempted to seriously write for more than 2 years, as I’m diving back into software engineering after an extended sojourn in team building and eng management. I was feeling a bit .. rusty . Let me know if I got anything confused.
Fun things to discover:
As a meta point, it’s amazing to wander into a fresh programming language, 15 years after the original “ associated types with class ” paper, and find basically all the original concepts available, and extended in some interesting ways. There’s a lot of feeling of deja vu for someone who worked on GHC optimizations/unboxing/streams when writing in Rust.
First, a direct port of the original paper . A data type for stepping through elements of stream, including ‘Skip’ so we can filter things. And a struct holding the stream stepper function, and the state. Compared to the Haskell version (in the comment) there’s more rituals to declare what goes on the heap, and linking the lifetime of objects together. My reward for doing this is not needing a garbage collector. There’s quite an enjoyable serotonin reward when a borrow checking program type checks :-)
You can probably infer from the syntax this will be operationally expensive: a closure boxed up onto the heap. Rust’s explicit control of where things are stored and for how long feels a lot like a type system for scripting allocators, and compared to Haskell you’re certainly exactly aware of what you’re asking the machine to do.
Overall, its a fairly pleasing translation and even though I’m putting the closure on the heap I’m still having it de-allocated when the stream is dropped. Look mum, closure passing without a GC.
We can create empty streams, or streams with a single element in them. Remember a stream is just a function from going from one value to the next, and a state to kick it off:
The lambda syntax feels a bit heavy at first, but you get used to it. The hardest part of these definitions were:
I can generate a stream of values, a la replicate:
As long as I’m careful threading lifetime parameters around I can box values and generate streams without using a GC. This is sort of amazing. (And the unique lifetime token-passing discipline feels very like the ST monad and its extensions into regions/nesting). Again , you can sort of “feel” how expensive this is going to be, with the capturing and boxing to the heap explict. That boxed closure dynamically invoked will have a cost.
Let’s consume a stream, via a fold:
Not too bad. The lack of tail recursion shows up here, so while I’d normally write this as a ‘go’ local work function with a stack parameter, to get a loop, instead in Rust we just write a loop and peek and poke the memory directly via a ‘mut’ binding. Sigh, fine, but I promise I’m still thinking in recursion.
Now I can do real functional programming:
What about something a bit more generic: enumFromTo to fill a range with consecutive integer values, of any type supporting addition?
The trait parameters feel a lot like a super layered version of the Haskell Num class, where I’m really picking and choosing which methods I want to dispatch to. The numeric overloading is also a bit different (A::one()) instead of an overloaded literal. Again, is almost identical to the Haskell version, but with explicit memory annotations, and more structured type class/trait hierarchy. Other operations, like map, filter, etc all fall out fairly straight forward. Nice: starting to feel like I can definitely be productive in this language.
Now I can even write a functional pipeline — equivalent to:
<strong> sum . map (*2) . filter (\n -> n`mod`2 == 1) $ [1..1000000::Int] </strong>=> 500000000000
As barebones Rust:
$ cargo run
Another moment of deja vu, installing Criterion to benchmark the thing. “cargo bench” integration is pretty sweet:
So about 7 ms for four logical loops over 1M i64 elements. That’s sort of plausible and actually not to bad considering I don’t know what I’m doing.
The overhead of the dynamic dispatch to the boxed closure is almost certainly going to dominate, and and then likely breaks inlining and arithmetic optimization, so while we do get a fused loop, we get all the steps of the loop in sequence. I fiddled a bit with the inlining the consuming loop, which shaved 1ms off, but that’s about it.
A quick peek at the assembly f–release mode, which I assume does good things, and yeah, this isn’t going to be fast. Tons of registers, allocs and dispatching everywhere. Urk.
But it works! The first thing directly translated works, and it has basically the behavior you’d expect with explict closure calls. Not bad!
That boxed closure bothers me a bit. Dispatching to something that’s known statically. The usual trick for resolving things statically is to move the work to the type system. In this case, we want to lookup the right ‘step’ function by type. So I’ll need a type for each generator and transformer function in the stream API. We can take this approach in Rust too.
I banged my head against this a few different ways, and settled on putting the state data into the ‘API key’ type. This actually looks really like something we already knew how to do – streams as Rust iterators – Snoyman already wrote about it 3 years ago! — I’ve basically adapted his approach here after a bit of n00b trial and error.
The ‘Step’ type is almost the same, and the polymorphic ‘Stream’ type with its existential seed becomes a trait definition:
What’s a bit different now is how we’re going to resolve the function to generate each step of the stream. That’s now a trait method associated with some instance and element type.
So e.g. if I want to generate an empty stream, I need a type, and instance and a wrapper:
I can convert all the generator functions like this. E.g. to replicate a stream I need to know how many elements, and what the element is. Instead of capturing the element in a closure, it’s in an explicit data type:
The step function is still basically the same as in the Haskell version, but to get the nice method syntax we have it all talk to ‘self’.
We also need a type for each stream transformer. E.g. a ‘map’ is now a struct with the mapper function, and an underlying stream object it operates on.
This part is a bit more involved — when a map is applied to a stream element, we return f(x) of the element, and lift the stream state into a Map stream state for the next step.
I can implement Stream-generic folds now — again, since I have no tail recursion to consume the stream I’m looping explicitly. This is our real ‘driver’ of work , the actual loop pulling on a chain of ‘stream.next()’s we’ve built up.
Ok so with the method syntax this looks pretty nice:
I had to write out the types here to understand how the method resolving works. We build up a nice chain of type information about exactly what function we want to use at what type. The whole pipeline is a Map<Filter<Range < … type> , all an instance of Stream.
So this should do ok right? No boxing of closures, there could be some lookups and dispatch but there’s enough type information here to know all calls staticallyI don’t have much intuition for how Rust will optimize the chain of nested Yield/Skip constructors.. but I’m hopeful.
288 micro seconds to collapse a 1_000_000 element stream. Or abou t 25x faster. Nice!
So the type information and commitment to not allocating to the heap does a lot of work for us here. I ask cargo rustc –bin stream_test –release — –emit asm for fun. And this is basically what I want to see: a single loop, no allocation, a bit of math. Great.
It’s converted the %2 / *2 body into adding a straight i64 addition loop with strides. I suspect with a bit of prodding it could resolve this statically to a constant but that’s just a toy anyway. All the intermediate data structures are gone.
So that’s a pretty satisfying result. With minimal effort I got a fusing iterator/stream API that performs well out of the box. The Rust defaults nudge code towards low overhead by default. That can feel quite satisfying.