From adhoc script to object-oriented program
NOTE: This issue of Practicing Ruby was one of several content experiments that was run in Volume 6. It uses a cookbook format (e.g. problem -> solution -> discussion) instead of the traditional long-form article format we use in most Practicing Ruby articles.
Problem: An adhoc script has devolved into an unmaintainable mess
Imagine that you’re working on a shipping cost estimation program for a small business that uses a courier service for regional deliveries. Part of the task for building that tool would involve importing pricing information from some data source, such as this CSV file:
06770,$12.00
06512,$14.00
06510,$15.30
06701,$12.15
A real dataset would be more complex, but this minimal example exposes the information we’re interested in: what it costs to ship something from our facility to somewhere else, based on the destination’s zip code.
Now suppose that we want to build a simple data store which will be updated
daily with the latest pricing information. We then could easily write a script
using a few of Ruby’s standard libraries (PStore
, BigDecimal
, and CSV
),
which would normalize the data in a way that could be used by the user-facing
cost estimation program. If we could assume the source CSV data was validated
before we processed it, the program could be as simple as what you see below:
require "csv"
require "pstore"
require "bigdecimal"
store = PStore.new("shipping_rates.store")
store.transaction do
CSV.foreach(ARGV[0] || "rates.csv") do |r|
zip = r[0]
amount = BigDecimal.new(r[1][1..-1])
store[zip] = amount
end
end
But in reality, most businesses environments do not make things like this easy for you. You’d probably quickly discover that the source data could have any number of problems with it, ranging from duplicate entries to inconsistently formatted fields. Because this kind of data often originates from people who are entering information into Excel by hand, they can even be littered with typos!
To help mitigate these issues somewhat, you need a combination of sanity-checking validations and basic logging so that when something goes wrong you know why it happened. After adding those features, your simple script might collapse into the mess you see below:
require "csv"
require "pstore"
require "bigdecimal"
store = PStore.new("shipping_rates.store")
store.transaction do
processed_zipcodes = []
CSV.foreach(ARGV[0] || "rates.csv") do |r|
raise unless r[0][/\A\d{5}\z/]
raise unless r[1][/\A\$\d+\.\d{2}\z/]
zip = r[0]
amount = BigDecimal.new(r[1][1..-1])
raise "duplicate entry: #{zip}" if processed_zipcodes.include?(zip)
processed_zipcodes << zip
next if store[zip] == amount
if store[zip].nil?
STDERR.puts("Adding new entry for #{zip}: #{'%.2f' % amount}")
elsif store[zip] != amount
STDERR.puts("Updating entry for #{zip}: "+
"was #{'%.2f' % store[zip]}, now #{'%.2f' % amount}")
end
store[zip] = BigDecimal.new(amount)
end
end
Once your code ends up like this, it becomes increasingly difficult to add new features or make any sort of change without breaking something. Because this style of program is fairly difficult to test, the maintenance problems can be made even worse by the fact that bugs may end up not being discovered until long after they’re introduced.
Procedural scripts are great when you can throwaway the code once you’ve completed your task, or for solving simple problems that you are reasonably sure the requirements will never change for. For everything else, more structure pays off in the long run. It’s clear that this program is in the latter category, so how do we fix it?
Solution: Redesign the script as an object-oriented program
The thing that makes ad-hoc scripts complicated to reason about as they grow is that they blend all their concerns together – both logically and conceptually. For that reason, it is worthwhile to start thinking in terms of functions and objects as soon as your program exceeds more than a paragraph or two of code.
Imagine that the script portion of your importer tool was reduced to the following code:
require "csv"
Importer.update("shipping_rates.store") do |store|
CSV.foreach(ARGV[0] || "rates.csv") do |r|
info = PriceInformation.new(zipcode: r[0], shipping_rate: r[1])
store[info.zipcode] = info.shipping_rate
end
end
This brings us back to about the same level of detail expressed in the naïve implementation of the importer script, albeit with a few custom classes thrown into the mix. It hides a lot of detail from the reader, but its core purpose is obvious: it iterates over a CSV file to create a mapping of zipcodes to shipping rates in a datastore.
To see where the real work is being done, we need to look at the
PriceInformation
and Importer
class definitions. We’ll start by taking a
look at the former, because it has fewer moving parts to consider:
require "bigdecimal"
class PriceInformation
ZIPCODE_MATCHER = /\A\d{5}\z/
PRICE_MATCHER = /\A\$\d+\.\d{2}\z/
def initialize(zipcode: raise, shipping_rate: raise)
raise "Zipcode validation failed" unless zipcode[ZIPCODE_MATCHER]
raise "Shipping rate validation failed" unless shipping_rate[PRICE_MATCHER]
@zipcode = zipcode
@shipping_rate = BigDecimal.new(shipping_rate[1..-1])
end
attr_reader :zipcode, :shipping_rate
end
Here we see that PriceInformation
applies the same validations and
transformations as shown in the script version of this program, but
encapsulates them in its constructor. This makes sure that a PriceInformation
object will either represent valid data or not be instantiated at all,
which makes it so that the main script does not need to concern itself
with these issues. Even if these validations or transformations become
more complex over time, the calling code should not need to change.
In a similar vein, the Importer
class attempts to encapsulate the details
about some lower level concepts at a higher level of abstraction. It’s
functionality is a bit more involved than the PriceInformation
class,
so take a few minutes to study it before moving on:
require "pstore"
class Importer
def self.update(filename)
store = PStore.new(filename)
store.transaction do
yield new(store)
end
end
def initialize(store)
self.store = store
self.imported = []
end
def []=(key, new_value)
raise_if_duplicate(key)
old_value = store[key]
return if old_value == new_value # nothing to do!
if old_value.nil?
ChangeLog.new_record(key, new_value)
else
ChangeLog.updated_record(key, old_value, new_value)
end
store[key] = new_value
end
private
attr_accessor :store, :imported
def raise_if_duplicate(key)
raise "Duplicate key in import data: #{key}" if imported.include?(key)
imported << key
end
end
Despite the complexity of its implementation, this class presents a very minimal
user interface, consisting of only Importer.update
and Importer#[]=
. The
Importer.update
method is responsible for instantiating a PStore
object,
initiating a transaction, and then wrapping it in an Importer
instance to
limit access to its internals. From there, the only method available to the user
is Importer#[]=
, which wraps PStore#[]=
with two important features:
-
Single-assignment semantics: once a key has been set to particular value, it cannot be reset from within the same
Importer
instance. This is because we want to raise an exception whenever we encounter duplicate keys in the data we’re importing. -
Update notifications: For debugging purposes, we want to know whether a record is introducing a new key, or updating the value associated with an old one. Rather than cluttering up this class with the particular log messages associated with those events, we delegate to a
ChangeLog
helper object, which is shown below:
class << (ChangeLog = Object.new)
def new_record(key, value)
STDERR.puts "Adding #{key}: #{f(value)}"
end
def updated_record(key, old_value, new_value)
STDERR.puts "Updating #{key}: Was #{f(old_value)}, Now #{f(new_value)}"
end
private
def f(value)
'%.2f' % value
end
end
With this last detail exposed, you’ve walked through the complete object-oriented solution to this problem. It is much longer than the script version, but also much more organized. Before we wrap things up, let’s talk a bit more about the costs and benefits involved in introducing more structure into your programs.
Discussion
The best thing about unstructured code is that nothing is hidden from view. To understand a script, you start at the top of the file and read downwards, mentally evaluating the state changes and iterators you encounter along the way.
Object-oriented programs are much more logically complex, because they
represent a network of collaborators rather than a linear set of instructions.
For example, whenever we make a call to Importer#[]=
, messages are sent to the
ChangeLog
helper object as well as to an instance of PStore
, but these
details are not at all visible when you read the caller code. The more objects
that exist within a system, the more complex their interactions get, and so
it is not uncommon to end up with call graphs that are both wide and deep.
But when it comes to visibility, the strength of scripted solutions is also their weakness, and the weakness of object-oriented programs is also their strength:
-
In an adhoc script, you cannot make simple decisions about your code without considering the entire program. Even something as straightforward as renaming a variable used for temporary storage must be carefully considered, because everything exists within a single namespace; anything more involved than that is simply inviting trouble unless you can keep the entire program in your head at once.
-
In an object-oriented program, the walls erected between different objects give you freedom to make sweeping changes to internal structures, as long as their interfaces are preserved. You can even rewire entire subnetworks of functionality from your programs, as long as you know what features depend on them. When done well, the fact that you cannot keep an entire object-oriented program in your head is not much of a concern, because the layered abstractions make it so you don’t have to.
The real challenge involved in writing object-oriented programs is that they’ll only be as useful as the mental model they represent. This is why it can actually be helpful to start off with less structure (even none at all!), and gradually work your way towards something more organized. After all, there is nothing worse than an abstract solution in search of a concrete problem!
Practicing Ruby is a Practicing Developer project.
All articles on this website are independently published, open source, and advertising-free.