How I Learned to Stop Staging and Love Releasing (at YPlan)

When I started at YPlan, the first question I asked was: Where's the staging environment? You see, I was used to a hierarchical release process: you develop on your machine, then you send your code for review, it gets merged into master, all good. When the release features are completed, a Release Candidate release is cut, and gets tested, if not ok gets back in development. But if it is ok then it's pushed to staging, and tested again, and then pushed to production and released.

What is happening on each step? Your tests will certainly pass on your development machine, right? otherwise, you wouldn't submit code for review. The reviewers will merge the code if they are happy with it, and then the test environment will verify that all the tests pass correctly. But this testing usually happens on simulated or a subset of data. And it only tests the code, not the deployment procedures, or the interaction with real world data, or system integration. So to test this, staging, which usually is a system with the same environment, data, and configuration as production is used.

Why use staging?

This long sequence makes a lot of sense if you have big releases; you work for weeks, or months for a new feature; the changes are everywhere in the code; you want to make sure you don't break anything, and you want to make sure that all the new stuff works as expected. But why? I have never asked myself this before - I just took for granted all the steps needed to make sure your release is correct.

The why is more apparent if you consider the release to be an expensive process, in both terms of the actual difficulty of the release, the costs incurred if the release code is buggy, and if it is expensive to repair a problematic release.

  • Release cost: If your release is printing DVDs and boxing them and shipping them in retail stores around the world, the release process is actually pretty difficult.
  • Failure cost: Maybe you write software for nuclear power plants - any bug you have may doom a part of the planet for centuries.
  • Maintenance cost: Or, if your code is in a car engine, getting the cars recalled into dealerships for upgrades is going to be expensive and damaging to your reputation.

If any of these points are true, you will want to make sure you never encounter them.

Why not use staging?

Staging Photo credit: unattributed with modifications. / Public Domain.

The downside is that the act of actually taking these precautions makes the release process difficult, a Catch-22 that only reinforces the cycle of expensive and careful releases. It was illuminating for me to realize that you can end up doing the long release cycle only for its own sake. You can break free of the chains, and disrupt this.

Enter YPlan. We don't do staging, because, frankly, there is no need for it. I was reluctant at first, but after my enlightenment mentioned above, I figured out that we don't have the pain points mentioned for releasing code often.

  • Release cost: Release process is super easy. We have a backend service and a frontend website. The release is just copying over files to the production server (well, it's slightly more complicated, but the gist of it is - it's done by pressing a button). So the cost tends to be 0.
  • Failure cost: If we have a bug, that's not the end of the world. Maybe we lose some money. Maybe we screw up a bit the database. Nothing that cannot be repaired with a day's worth of work. The cost is minimal.
  • Maintenance cost: While the system is in operation, we do monitor it and discover bugs. But since our release process is very cheap, fixing a bug and releasing it is also quite cheap, too.

So it turns out, we get a better release cycle timing without staging. We have some good helpers here, too - test code coverage, which lets us assess whenever a release is good through automated testing; and excellent monitoring tools, which allow us to detect problems and pinpoint root causes fast. But the most important thing was to realize that we don't need staging just for staging's sake.

What's the outcome of all this: we release each feature or bug fix as it becomes available. We release fast, and often - we average about 3 releases per day. We can release with a latency of just 20-30 mins after the code gets merged into master. The fast and easy release process makes staging obsolete.

What should you do?

I guess the key takeaway I have from all this: when you're using a process, consider first WHY are you using that process, and change what you do to actually help you get your goals, not just follow procedure automatically. We have bots for that.

Robot roboting Photo credit: Tibor Antalóczy / CC BY-SA 3.0.