Building a better Turing tarpit
In Issue 3.3, I presented a proof-of-concept Ruby implementation of the Brainfuck programming language and challenged Practicing Ruby readers to improve upon it. After receiving several patches that helped move things along, I sat down once again to clean up the code even further. What I came to realize as I worked on my revisions is that the refactoring process is very similar to climbing a spiral staircase. Each structural change to the code simultaneously left the project back where it started along one vector while moving it forward along another.
Because we often look at the merits of a given refactoring technique within the context of a single transition from worse code to better code, it’s easy to mistakenly assume that the refactoring process is much more linear than it actually is. In this article, I’ve tried to capture a much wider angle view of how refactoring really works in the wild. The end result is a story which I hope will spark some good discussions about how we can improve our code quality over time.
Prologue. Everything has to start somewhere
I decided to name my interpreter Turing Tarpit, because that term is perfectly apt for describing languages like Brainfuck. In a nutshell, the term refers to any language which is infinitely flexible, yet nearly impossible to use for anything practical. It turns out that building this sort of mind trap for programmers is quite easy to do.
My first iteration was easy enough to build, and consisted of three objects: a Tape
, an Interpreter
, and a Scanner
. The rough breakdown of responsibilities was something like this:
-
The Tape object implemented something similar to the storage mechanism in a Turing machine. It provided mechanisms for accessing and modifying numeric values in cells, as well as a way to increment and decrement the pointer that determined which cell to operate on.
-
The Interpreter object served as a mapping between Brainfuck’s symbolic operators and the operations provided by the
Tape
object. It also implemented the I/O functionality required by Brainfuck. -
The Scanner object was responsible for taking a Brainfuck source file as input and transforming it into a stream of operations that could be handled by the
Interpreter
object. For the most part this simply meant reading the source file one character at a time, but this object also needed to account for Brainfuck’s forward and backward jump operations.
While my initial implementation was reasonably clean for a proof-of-concept, it definitely had room for improvement. I decided to ask for feedback early in the hopes that folks would find and fix the things I knew were problematic while simultaneously checking my blindspots for issues that I hadn’t noticed myself.
Act I. Getting a fresh perspective on the problem
Some of the issues brought up by contributors were fairly obvious housekeeping chores, but nonetheless made the project nicer to work with:
-
Steve Klabnik requested a way to run the whole test suite at once instead of file by file. He had provided a patch with a Rakefile, but since the project didn’t have any immediate need for other rake tasks, we ended up deciding that a simple test/suite.rb file would be sufficient. Notes were added to the README on how to run the tests.
-
Renato Riccieri broke the classes out into individual files. The original implementation had everything in lib/turing_tarpit.rb, simply for convenience reasons while spiking. Breaking the classes into individual files brought the project more in line with standard Ruby packaging conventions.
-
Benoit Daloze refactored some ugly output code to use
putc(char)
instead ofprint("" << char)
. Since the latter was obviously a hack due to my lack of awareness of theputc
method, this was a welcome contribution.
After this initial round of cleanup, we ended up thinking through a pair of more substantial problems: the inconsitent use of private accessors, and a proposed refactoring to break up the Scanner
object into two separate objects, a Tokenizer
and a Scanner
.
The story behind my recent private accessor experiments
Ryan LeCompte was the one to bring up the question about private accessors, and was curious about why I had used them in some places but referenced instance variables directly in others. The main reason for this was simply that the use of private accessors is a new experiment for me, and so in my haste of getting a first version out the door, I remembered to use them in some places but not in others.
This project in particular posed certain challenges for using private accessors conveniently. A specific example of where I ran into some weird edge cases can easily be seen in the Tape
object:
module TuringTarpit
class Tape
def initialize
self.pointer_position = 0
# ...
end
def increment_pointer
self.pointer_position = pointer_position + 1
end
# ...
private
attr_writer :pointer_position
end
end
If you just glance quickly at this class definition, it is very tempting to try to refactor increment_pointer
so that it uses convenient +=
syntax, resulting in something like the code below:
def increment_pointer
self.pointer_position += 1
end
In most cases, this refactoring would be a good one because it makes the code slightly less verbose without sacrificing readability. However, it turns out that Ruby does not extend the same private method special casing to self.foo += something
as it does to self.foo = something
. This means that if you attempt to refactor this code to use +=
it ends up raising a NoMethodError
. Because this is definitely a downside of using private accessors, it’s reasonable to ask why you’d bother to use them in the first place rather than using public accessors or simply referring to instance variables directly.
The best reason I can find for making use of accessors in general vs. instance variables is simply that the former are much more flexible. New behavior such as validations or transformations can be added later by changing what used to be vanilla accessors into ordinary method definitions. Additionally, if you accidentally introduce a typo into your code, you will get a NoMethodError
right away rather than having to track down why your attribute is nil
when you didn’t expect it to be in some completely different place in your code.
The problem with making accessors public is that it hints to the consumer that it is meant to be touched and used, which is often not the case at all, especially for writers. While Ruby makes it trivial to circumvent privacy protections, a private method communicates to the user that it is meant to be treated as an implementation detail and should not be depended on. So the reason for using a private accessor is the same as the reason for using a private method: to mark the accessor as part of the internals of the object.
The interesting thing I stumbled across in this particular project is that if you take this technique to the extreme, it is possible to build entire applications without ever explicitly referencing an instance variable. It comes at the cost of the occasional weird edge case when calling private methods internally, but makes it possible to treat instance variables as a whole as a language implementation detail, rather than an application implementation detail. Faced with the opportunity to at least experiment with that idea, I decided to make the entire Turing Tarpit codebase completely free of instance variables, which ended up taking very little effort.
The jury is still out on whether or not this is a good idea, but I plan to keep trying the idea out in my projects and see whether I run into any more issues. If I don’t experience problems, I’d say this technique is well worth it because it emphasizes message-passing rather than state manipulation in our objects.
Splitting up the Scanner object
After helping out with a few of the general housekeeping chores, Steve Klabnik then turned his attention to one of the weakest spots in the code, the Scanner
object. He pointed out that having an object with dependencies on a whole lot of private methods is a bit of a code smell, and focused specifically on the Scanner#next
method. The original implementation looked like this:
module TuringTarpit
class Scanner
# ...
def next(cell_value)
validate_index
element = @chars[@index]
case element
when "["
jump_forward if cell_value.zero?
consume
element = @chars[@index]
when "]"
if cell_value.zero?
while element == "]"
consume
element = @chars[@index]
validate_index
end
else
jump_back
consume
element = @chars[@index]
end
end
consume
element
end
end
end
Steve pointed out that the Scanner#next
method was really doing more of a tokenizing operation, and that most of the scanning work was actually being done by the various private methods that were being used to traverse the underlying string. He prepared a patch which made this relationship explicit by introducing a Tokenizer
object which would provide a method to replace Scanner#next
. His newly introduced object allowed for a re-purposing of the Scanner
object which allowed its methods to become public:
module TuringTarpit
class Tokenizer
# ...
def next(cell_value)
scanner.validate_index
element = scanner.current_char
case element
when "["
scanner.jump_forward if cell_value.zero?
scanner.consume
element = scanner.current_char
when "]"
if cell_value.zero?
while element == "]"
scanner.consume
element = scanner.current_char
scanner.validate_index
end
else
scanner.jump_back
scanner.consume
element = scanner.current_char
end
end
scanner.consume
element
end
end
end
The thing in particular I liked about this patch is that it abstracted away some of the tedious index operations that were originally present in Scanner#next
. As much as possible I prefer to isolate anything that can cause off-by-one errors or other such nonsense, and this refactoring did a good job of addressing that issue.
The interesting thing about this refactoring is that while I intended to work on the same area of the code if no one else patched it, I had planned to approach it in a very different way. My original idea was to implement some sort of generic stream datastructure and reuse it in both Scanner
and Tape
. However, seeing that Steve’s patch at least partly addressed my concerns while possibly opening some new avenues as well, I abandoned that idea and merged his work instead.
Act II. Building a better horse
After applying the various patches from the folks who participated in this challenge, the code was in a much better place than where it started. However, much work was still left to be done!
In particular, the code responsible for turning Brainfuck syntax into a stream of operations still needed a lot of work. The Tokenizer
class that Steve introduced was an improvement, but without further revisions would simply serve as a layer of indirection rather than as an abstraction. Zed Shaw describes the difference between these two concepts very eloquently in his essay Indirection Is Not Abstraction by stating that “Abstraction is used to reduce complexity. Indirection is used to reduce coupling or dependence.”
As far as the Tokenizer
object goes, Steve’s patch reducing coupling somewhat by pushing some of the implementation details down into the Scanner
object. However, the procedure is pretty much identical with the exception of the lack of explicit indexing code, and so the baseline complexity actually increases because what was once done by one object is now split across two objects.
To address this problem, the dividing lines between the two objects needed to be leveraged so that they could interact with each other at a higher level. It took me a while to think through the problem, but in doing so I realized that I could now push more functionality down into the Scanner
object so that Tokenizer#next
ended up with fewer moving parts. After some major gutting and re-arranging, I ended up with a method that looked like this:
module TuringTarpit
class Tokenizer
# ...
def next(cell_value)
case scanner.next_char
when Scanner::FORWARD_JUMP
if cell_value.zero?
scanner.jump_forward
else
scanner.next_char
end
when Scanner::BACKWARD_JUMP
if cell_value.zero?
scanner.skip_while(Scanner::BACKWARD_JUMP)
else
scanner.jump_back
end
end
scanner.current_char
end
end
end
After this refactoring, the Tokenizer#next
method was a good deal more abstract in a number of ways:
-
It expected the
Scanner
to handle validations itself rather than telling it when to check the index -
It no longer referenced Brainfuck syntax and instead used constants provided by the
Scanner
-
It eliminated a lot of cumbersome assignments by reworking its algorithm so that
Scanner#current_char
always referenced the right character at the end of the scanning routine. -
It expected the
Scanner
to remain internally consistent, rather than handling edge cases itself.
These reductions in complexity made a hugely positive impact on the readability and understandability of the Tokenizer#next
method. While all of these changes could have technically been made before the split between the Scanner
and Tokenizer
happened, cutting the knot into two pieces certainly made untangling things easier. This is why indirection and abstraction often go hand in hand, despite the fact that they are very different concepts from one another.
Act III. Mountains are once again merely mountains
After building on top of Steve’s work to simplify the syntax-processing code even further, I finally felt like that part of the project was in decent shape. I then decided to turn my attention back to the Interpreter
object, since it had not received any love from the challenge participants. The original code for it looked something like this:
module TuringTarpit
class Interpreter
def run
loop do
case tokenizer.next(tape.cell_value)
when "+"
tape.increment_cell_value
when "-"
tape.decrement_cell_value
when ">"
tape.increment_pointer
when "<"
tape.decrement_pointer
when "."
putc(tape.cell_value)
when ","
value = STDIN.getch.bytes.first
next if value.zero?
tape.cell_value = value
end
end
end
end
end
While this implementation wasn’t too bad, there were two things I didn’t like about it. The first issue was that it directly referenced Brainfuck syntax, which sort of defeated the purpose of having the tokenizer be syntax independent. The second problem was that I found the case statement to feel a bit brittle and limiting. What I really wanted was a dynamic dispatcher similar to the following method:
def run
loop do
if operation = tokenizer.next(evaluator.cell_value)
tape.send(operation)
end
end
end
In order to introduce this kind of functionality, I’d need to find a place to introduce a simple mapping from Brainfuck syntax to operation names. I already had the keys and values in mind, I just needed to find a place to put them:
OPERATIONS = { "+" => :increment_cell_value,
"-" => :decrement_cell_value,
">" => :increment_pointer,
"<" => :decrement_pointer,
"." => :output_cell_value,
"," => :input_cell_value }
Figuring out how to make this work was surprisingly challenging. I found that the extra layers of indirection between the Tape
and the Scanner
meant that any change made too far down the chain would need to be echoed all the way up it, and that changes made towards the top felt tacked on and out of place. This eventually led me to question what the separation between Scanner
and Tokenizer
was really gaining me, as well as the separation between Interpreter
and Tape
.
After a fair amount of ruminating, I decided to take my four objects and join them together at the seams so that only two remained. The Scanner
and Tokenizer
ended up getting joined back together to form a new Interpreter
class. The job of the Interpreter
is to take Brainfuck syntax and turn it into a stream of operations. You can get a rough idea of how it all came together by checking out the following code:
module TuringTarpit
class Interpreter
FORWARD_JUMP = "["
BACKWARD_JUMP = "]"
OPERATIONS = { "+" => :increment_cell_value,
"-" => :decrement_cell_value,
">" => :increment_pointer,
"<" => :decrement_pointer,
"." => :output_cell_value,
"," => :input_cell_value }
def next_operation(cell_value)
case next_char
when FORWARD_JUMP
if cell_value.zero?
jump_forward
else
skip_while(FORWARD_JUMP)
end
when BACKWARD_JUMP
if cell_value.zero?
skip_while(BACKWARD_JUMP)
else
jump_back
end
end
OPERATIONS[current_char]
end
# ... lots of private methods are back, but now fine-tuned.
end
end
The old Interpreter
object and Tape
object were also merged together, forming a single object I ended up calling Evaluator
. The job of the Evaluator
object is to take a stream of operations provided by the newly defined Interpreter
object and then execute them against a Turing Machine like data structure. In essence, the Evaluator
object is nothing more than the original Tape
object I implemented along with a few extra methods which account for the things the original Interpreter
object was meant to do:
module TuringTarpit
class Evaluator
def self.run(interpreter)
evaluator = new
loop do
if operation = interpreter.next_operation(evaluator.cell_value)
evaluator.send(operation)
end
end
end
def output_cell_value
putc(cell_value)
end
def input_cell_value
value = $stdin.getch.ord
return if value.zero?
self.cell_value = value
end
# other methods same as original Tape methods
end
end
I had mixed feelings about recombining these objects, because to some extent it felt like a step backwards to me. In particular, I think this refactoring resulted in some minor violations of the Single Responsibility Principle, and increased the overall coupling of the system somewhat. However, the independence of the four different objects the system previously consisted of seemed artificial at best. To the extent that they could be changed easily or swapped out for one another, I could not think of a single practical reason why I’d actually want that kind of flexibility. In this particular situation it turned out that recombining the objects greatly reduced their communications overhead, and so was worth the loss in generality.
Epilogue. Sending the ship out to sea
I was really tempted to keep noodling on the design of this project, because even in my final version of the code I still felt that I could have done better. But at a certain point I decided that I could end up getting caught in this trap forever, and the only way to free myself from it was to wrap up my work and just ship the damn thing. This ultimately meant that I had to take care of several chores that neither I nor the various participants in this challenge bothered to work on earlier:
-
I added a set of integration tests which ran the
Evaluator
against a couple sample Brainfuck programs to make sure we had some decent end-to-end testing support. Found a couple bugs that way. -
I set up and ran simplecov to check whether my tests were at least running all the implementation code, and ended up spotting a faulty test which wasn’t actually getting run.
-
I added a bin/turing_tarpit file so that you can execute Brainfuck programs without building a Ruby shim first.
-
Did the usual gemspec + Gemfile dance and pushed a 1.0.0 gem to rubygems.org. Typically I’d call a project in its early stages a 0.1.0 release, but I honestly don’t see myself working on this much more so I might as well call it ‘production ready’.
After I wrapped up all these chores, I decided to go back and check out what my flog complexity scores were for each stage in this process. It turns out that the final version was the least complex, with the lowest overall score, lowest average score, and lowest top-score by method. The original implementation came in second place, and the other two iterations were in a distant third and fourth place. While that gave me some reassurances, it doesn’t mean much except for that Flog seems to really hate external method calls.
Reflections
This has been one of my favorite articles to write for Practicing Ruby so far. It forced me to look at the refactoring process in a much more introspective way than I have typically done in the past, and gave me a chance to interact with some of our awesome readers. I do think it ended up raising more questions and challenges in my mind than it did give me answers and reassurances, but I suppose that’s a sign that learning happened.
While I found it very hard to summarize the refactoring lifecycle for this project, my hope is that I’ve at least given you a glimpse of the spiral staircase metaphor I chose to name this article after. If it didn’t end up making you feel too dizzy, I’d love to hear your thoughts about this exercise as well as what your own process is like when it comes to refactoring code.
Practicing Ruby is a Practicing Developer project.
All articles on this website are independently published, open source, and advertising-free.