State Machine Pattern: A Pig in Lipstick

To begin with let’s define the type of state machine that is referred to in this post. State machine here refers to an explicitly coded state machine. The kind that causes terminology like: state, event and transition to appear in your code. Sometimes this is just a state variable and a switch statement and sometimes it’s an entire framework that’s been included. There are a litany of libraries out there that provide this type of functionality for almost every language imaginable. For example JavaScript and Ruby

Goto is universally despised and avoided by most programmers. State machines however seem to be heralded as a ‘good solution’ in a lot cases by these same programmers. In fact triggering a state machine event is worse than a goto in many cases because it not only jumps the flow of logic to some other arbitrary point in the program but many libraries provide the ability to trigger arbitrary code to run between transitions. This makes it extremely hard to follow the logic of the program. You end up with implicit loops and control flow, none of which is easily readable at a glance. You end up with all the problems that have been associated with goto. The biggest problem is that you lose the ability to codify the flow of business logic at the correct abstraction level. The forest is lost for the trees.

Let’s take a simple order flow as an example. You make an online purchase, payment is processed, shipping is started, package is delivered.

state_machine :state, initial: :created do
  event :pay do
    transition :created => :paying
  end
  event :ship do
    transition :paying => :shipping
  end
  event :transport do
    transition :shipping => :delivering
    transition :delivering => :delivering
  end
  event :deliver do
    transition :delivering => :delivered
  end

  event :fail do
    transition [:paying, :shipping] => :failed
  end

  after_transition on: :pay, do: :process_payment
  after_transition on: :ship, do: :start_shipment
  after_transition on: :transport, do: :record_location
  after_transition on: :fail, do: :handle_failure
  after_transition on: :deliver, do: :complete_order

  def process_payment
    res = HTTP.post('/payments', @params)
    if res.status == 200
      ship!
    else
      fail!
    end
  end

  #...
end

The above code is a striped down example of what the order flow might look like using a state machine library in ruby. It’s already complicated and a realistic one would be much worse. Pay is probably more than a simple synchronous API call and may actually be several asynchronous events itself. Shipment too is probably a set of states. Then add in all the failure cases and you have an explosion of complexity. You could try and move pay and ship into their own state machines but then you are juggling three of these monstrosities. Also process_payment would never be so simple and the points where ship! and fail! are triggered will be buried in the details and potentially even buried in separate classes. This code will quickly become spaghetti as soon as any real complexity is added.

Realistically this is not the code you wanted to write. It’s a product of the asynchronous and event driven nature of the problem you are attempting to solve. If you could you would probably choose to write something like this:

def start_order
  payment = pay()
  if payment.failed?
    handle_failure()
    return
  end
  shipment = ship()
  while shipment.delivering?
    shipment.record_location()
  end
  if shipment.failed?
    handle_failure()
    return
  end
  complete_order()
end

This is easy to understand and does a much better job at communicating the order flow at an abstraction level that is actually readable by a human. The problem is that the asynchronous nature of our order flow does not allow this code. How is it possible to to remain in the context of a single method call over the multiple days and weeks that this order flow may take to complete? It’s not of course. However, we can come much closer to the ideal without using explicit state machines. The key word here is ‘explicit’. Computers by nature are state machines but we don’t go programming them with explicit state, transitions and events. No, we created structured programming languages with flow control. So we need something that has regular programming flow control but can be paused and started at the point it stopped at a later time allowing code to appear linear while execution can be asynchronous.

This can be achieved with coroutines (and various other constructs, but I’m only going to talk about coroutines here). Now ruby has coroutines but they aren’t the type of coroutine that we need. Our coroutines need the ability to be completely stopped, stored in a database and be restarted, way later, potentially in a different process than the one that created it. Ruby’s coroutines have stacks and we can’t just serialize the stack to the database and start it up in a different process. While this may be theoretically possible, it’s not practically possible. What we need is called a stackless coroutine. Instead of diving into the theory I’m just going to show what the above order flow could look like in ruby if we wrote a stackless coroutine library.

class Order
  include StacklessCoroutine

  attr_accessor :state

  coroutine(:state) do |exec|                            # define a coroutine providing
    exec                                                 #   the name of the state variable
      .pay                                               # invoke subroutines
      .if { |payment| payment.failed? }                  # control flow
        .run { handle_failure() }                        # execute code
        .return                                          # exit coroutine
      .end
      .ship
      .while { |shipment| shipment.delivered? }          # loop
        .run { |shipment| shipment.record_location() }
        .yield                                           # pause execution/yield a result
      .end
      .if { |shipment| shipment.failed? }
        .run { handle_failure() }
        .return
      .end
      .run { complete_order() }
  end

  subroutine(:pay) do |exec|                             # define a subroutine providing
    exec                                                 #   the name used to invoke it
      .run { pay() }
      .yield
  end

  subroutine(:ship) do |exec|
    exec
      .run { ship() }
      .yield
  end
end

Using stackless coroutines it’s possible to come much closer to the ideal way we want to write this code. The order flow can actually be grokked from the code now. Since our coroutines can call subroutines we can easily work around the issue of pay, and ship becoming their own complex flows using constructs we are familiar with. As with regular coroutines yield will pause execution allowing us to invoke the coroutine again at a later time from that position.

def start_order
  order = Order.new
  order.invoke()
  save(order.state)
end

# in another execution context
def payment_complete_cb(payment)
  order.state = load()
  order.invoke(payment)
  save(order.state)
end

def shipment_update_cb(shipment)
  order.state = load()
  order.invoke(shipment)
  save(order.state)
end

All that needs to be done to facilitate the execution of our coroutine is to persist the state (an integer) between invocations and to call the invoke method with the current execution context. The invoke method will run the coroutine starting from wherever the last yield statement was. This will provide us with code that almost looks normal but is able to handle the asynchronous flows that dominate the industry today. Next time you run into a situation that looks like it might require a state machine, think again a coroutine will probably end up being more readable.

If you’re curious to try some code out yourself I’ve created implementations of stackless coroutines in Ruby and C. The ruby implementation is almost exactly as shown above but I haven’t implemented return yet.