NOTE: This is one of four lessons learned from my 90 day self-study on test-driven development. If this topic interests you, be sure to check out the other lessons!

To maintain a productive TDD workflow, you need understand the difference between refactoring and redesign. These two activities are distinct from one another, but because they are often done in lockstep, it can be challenging to mentally separate them.

The problem I noticed in my own work is that seemingly simple changes often spiral into much more complex modifications. Whenever that happens, it is easy to make bad decisions that can cause progress to grind to a halt. Having a good way to distinguish between what can be accomplished via simple refactorings and what requires careful design consideration seems to be the key to preventing this problem.

My hope is that by reading what I have learned from my own experiences, you will be able to avoid some of these obstacles along your own path. These lessons are not fun to learn the hard way!

What is refactoring?

Refactoring in the traditional sense has to do with making small and safe transformations to a codebase without altering its external behavior. Because refactorings are designed to be atomic and almost trivial, you can apply them whenever you feel that they will make life easier for you down the road. For example, it is rarely a bad idea to clean up messy code by introducing a couple helper methods:

       def belongs_to(parent, params)
-        mapper.record_class.send(:define_method, parent) do
-          Object.const_get(params[:class]).find(send(params[:key]))
+        define_association(parent) do
+          BrokenRecord.string_to_constant(params[:class])
+                      .find(send(params[:key]))
         end
       end
 
       def has_many(children, params)
         table_primary_key = mapper.primary_key
 
-        mapper.record_class.send(:define_method, children) do
-          Object.const_get(params[:class])
-                .where(params[:key] => send(table_primary_key))
+        define_association(children) do
+          BrokenRecord.string_to_constant(params[:class])
+            .where(params[:key] => send(table_primary_key))
         end
       end

On the surface, this change is very superficial, as a proper refactoring ought to be. However, it has several immediate advantages worth pointing out:

  • The define_association helper makes the code reveal its intentions much more clearly by hiding some awkward metaprogramming.

  • The BrokenRecord.string_to_constant method makes it easy to extend this code so that it handles fully qualified constant names (i.e. SomeProject::Person), without the need to add a bunch of extra noise in multiple places.

  • Both helper methods cut down on duplication, eliminating the connascence of algorithm that was present in the original code.

  • Both helper methods reduce the amount of implementation details that the belongs_to and has_many methods need to be directly aware of, making them more adaptive to future changes.

The important thing to notice here is that while making this change opens a lot of doors for us, and has some immediate tangible benefits, it does not introduce any observable functional changes, both from the external perspective, and from the perspective of the object’s collaborators.

What is redesign?

While the concept of refactoring is easy to define and categorize, the process of redesigning code is not nearly as straightforward. Rather than attempting to provide an awkard definition for it, I will demonstrate what makes redesign different from refactoring by showing you a real example from my study.

When working on BrokenRecord (my toy implementation of the Active Record pattern), I initially designed it so that a single object was responsible for running queries against the database and mapping their results to user-defined models. This worked fine as a proof of concept, and the code was pretty easy to follow.

However, designing things this way lead to very high coupling between the query API and the underlying database implementation, as you can see in the following code:

module BrokenRecord
  class Table
  
   #...

    def create(params)
      escapes = params.count.times.map { "?" }.join(", ") 
      fields  = params.keys.join(", ")

      BrokenRecord.database.execute(
        "insert into #{@table_name} (#{fields}) values (#{escapes})",
        params.values
      )
    end

    def find(id)
      BrokenRecord.database
                  .execute("select * from #{@table_name} where id = ?", [id])
                  .map { |r| @row_builder.build_row(self, r) }.first
    end
  end
end

Even though I had no intentions of making BrokenRecord into a library that could be used for practical applications, this design was fundamentally inconsistent with what it means to be an object-relational mapper. The lack of abstraction made any sort of query optimization impossible, and also prevented the possibility of introducing support for multiple database backends.

In addition to these concerns about future extensibility, the current design made it much harder to test this code, and much harder to do some common queries without directly hijacking the global reference to the underlying database adapter. All these things combined meant that a redesign was clearly in order.

Taking a first glance at the implementation of BrokenRecord::Table, it was tempting to think that all that was needed here was to extract a class to encapsulate the database interactions. But because this object had come into existence as a result of a broad-based integration test rather than a series of focused unit tests, I was hesitant to perform an extraction without writing a few more tests first.

Thinking about the problem a little more, I noticed that the changes I wanted were deeper than just putting together an internal object to hide some implementation details and reduce coupling. The fact that Table was the best name I could think of for my extracted object even though that was the name of the original class was a sign that I was in the process of changing some responsibilities in the system, not just grouping related bits of functionality together.

Taking a TDD-friendly approach to redesign

The mistake I’ve made in the past when it comes to redesigning internal objects is that I tended to make my changes recursively, often without introducing new tests as I went. So for example, I might take a helper object that had gotten too complex and break it into two objects, testing both objects only indirectly through some higher level test. That kind of change would often reveal to me that I wanted to extract even more classes or methods, or possibly even change the protocols between the low-level collaborators in the system.

Sooner or later, I would end up with a complicated web of internal objects that were all being tested through a single use case at the high level, and so any defects I introduced became very hard to track down. Even though my tests were protecting external defects from creeping into the system, I had negated the design and debugging benefits that come along with doing TDD more rigorously.

After discussing this bad habit of mine with Eric Hodel during one of Mendicant University’s study sessions, I came to realize that there are some simple ways to sidestep this problem. In particular, I realized that I could redesign systems by introducing new components from the bottom up, cutting over to the new implementation only when it was ready to be integrated.

Wanting to try out these new ideas in BrokenRecord, I started out by renaming the BrokenRecord::Table object to BrokenRecord::RecordTable. I put virtually no thought into the new name, because what I was really trying to do was free up the BrokenRecord::Table name so that I could completely change the responsibilities associated with it. This allowed me to experience a similar amount of freedom that simply deleting the original class would have given me, but without the cost of having to work through a bunch of orphaned references and broken tests in my system.

I drove the new BrokenRecord::Table object test first, mostly mirroring the ideas from the original object but sticking strictly to the interactions with the database and representing records as simple Hash objects. I also added a new feature which provided information about the columns in a given table. You can get a rough idea for how I sketched out that feature by checking out the following test:

  it "must be able to retrieve column information" do
    columns = table.columns

    columns.count.must_equal(3)

    columns[:id][:type].must_equal("integer")
    columns[:title][:type].must_equal("text")
    columns[:body][:type].must_equal("text")
  end

The original BrokenRecord::Table object was just a first iteration spike, and so it expected that all model objects explicitly defined what fields were in the tables they mapped to. This helped keep the implementation simple, which was essential when the class was taking on two responsibilities at once. However, in the new BrokenRecord::Table object, this kind of low level database interaction looked perfectly at home, and paved the way for removing the tedious BrokenRecord::RowMapper object in the newly designed system.

Throughout the process of building better internals from the bottom up, I was able to make these kinds of revisions to several objects, and also introduced a couple more internal objects to help out with various things. Sooner or later, I reached the point where I was ready to create an object that could serve as a drop-in replacement for the original BrokenRecord::Table object (the one I renamed RecordTable). Feeling like I might actually keep this new object around for a while, I decided to name it TableMapper, which at least sounded slightly less horrible than RecordTable. Its methods ended up looking something this:

module BrokenRecord
  class TableMapper

   # ...
    
    def create(params)
      id = @table.insert(params)    
    
      find(id)
    end

    def find(id)
      fields = @table.where(:id => id).first

      return nil unless fields

      @record_class.new(:table => @table,
                        :fields => fields,
                        :key    => id)
    end
  end
end

Functionality-wise, the newly created BrokenRecord::TableMapper was nearly a drop in replacement for the original system, even though it had a much better underlying design. Because it only needed to implement a handful of methods to maintain API compatibility, integrating it went very smoothly, and required almost no changes to the original top-level tests. Once I cut things over and had all the tests passing, I was able to completely remove the BrokenRecord::RecordTable object without any issues.

Reflections

If I had not taken this more disciplined approach and instead followed my old ways, I probably would have ended up in about the same place design-wise, but it would have come at a much higher cost. I would have had fewer tests, spent more time debugging trivial errors, and probably would have cut corners in places out of impatience or frustration. The overall codebase would have still been quite brittle, and future changes would be harder to make rather than easier. Taking that less disciplined approach might have allowed me to implement this particular set of changes a little faster, but my past experiences have taught me that I always end up having to pay down my techinical debt sooner or later.

By teaching myself to think of refactoring and redesign as distinct activities, I am much more likely to stop myself from going on long exploratory cleanup missions with little guidance from my tests. This has already made a big difference in my own work, so I’d definitely recommend giving it a try.

If you have questions, or a story of your own to share, please leave me a comment!