Enumerator Goodness Even With Legacy Code
Our codebase has a lot of fairly old Ruby code that isn’t quite up to snuff on the current best practices. One area in which this is particularly true is in how we deal with blocks. It’s not uncommon to see some code like this in our codebase:
def find_each(row_id, start: nil)
current = start
loop do
res = client.get_chunk(row_id, start: current)
break if res.empty?
yield res
current = res.last
end
end
This is a pretty common thing to see in Ruby. This code could be doing something like walking a Cassandra row. Yielding records this way is really important so we don’t blow up either our databases or our workers as it gives Ruby time to GC the results between yields.
Unforuntately however, this still isn’t great. Our jobs are more or less generic, so it makes more sense to instead
pass around different streams. One approach you could take is to duck-type on some method (say :find_each
), but that
can still be pretty limiting and difficult to maintain as our application rapidly grows.
Also, what if we only want the first item of a stream or something? With the above approach we’d have to do something along the lines of:
def find_first(row_id)
find_each(row_id) do |slice|
return slice.first
end
end
This is a border-line GOTO
statement if you ask me. A better approach would be to use an Enumerator
. In case you’re not familiar with
using enumerators, I think it’s easiest to just walk through an example:
def find_each(row_id, start: nil)
Enumerator.new do |yielder|
current = start
loop do
res = client.get_chunk(row_id, start: current)
break if res.empty?
yielder.yield res
current = res.last
end
end
end
Enumerator.new
takes a block that yields a yielder
. This is just an object that you can call yield
on (if you’re curious, Ruby
uses Fibers
under the hood to switch execution contexts). Just like above, you can simply pass it the thing you want to yield
.
Why do we do this? This makes our stream much more flexible and allows us to pass it around as an object and tack things on (and
make it lazy
which I won’t cover here but you should play around with). Here’s what our find_first
looks like now:
def find_first(row_id)
find_each(row_id).first
end
Can’t get much simpler than that. However we still have one more slight problem. Our API which worked with blocks before is now broken:
find_each(row_id) { |slice| puts slice }
#=> Doesn't work! :(
Digging through the Rails codebase I found a few examples of the use of to_enum
(or enum_for
, they are aliased). This method is
cool because it lets you write your methods with just yield
but then allows you to wrap them in enumerators very easily.
Implementing with it looks like:
def find_each(row_id, start: nil)
return to_enum(:find_each, row_id, start: start) unless block_given?
current = start
loop do
res = client.get_chunk(row_id, start: current)
break if res.empty?
yield res
current = res.last
end
end
This first checks if a block is given to the method. If so, continue and use it. If not, create an Enumerator
using this method.
This is equivalent to wrapping the rest of the body in the Enumerator.new {...}
syntax we used previously. The first argument to to_enum
is the method name, and the rest are arguments that will get passed to that method. The end result of this approach is a solution that has
all of the conveniences of both the Enumerator
and the direct yield
approach:
def find_first(row_id)
find_each(row_id).first
end
find_each(row_id) { |slice| puts slice }
#=> will print each slice
to_enum
can also be used to help me with our legacy code; I don’t have to go around touching a bunch of old code to have enumerators if
I don’t want to. I can simply use to_enum
with the first implementation:
to_enum(:find_each, row_id).first
Enumerators are one of the most powerful Ruby tools out there. They can help you write code that is not only elegant and flexible but also highly performant (e.g. batching requests to reduce bandwidth). Moreover, you can do all of this transparently without the consumer of your Enumerator having to know the details of how the records are fetched. Ultimately it’s well worth the investment to take the time to master them.