Welcome to the runbook.cloud blog. My name is Sam Bashton, and I’m the founder and lead developer on runbook.cloud.
I’m intending this blog to be a source of useful knowledge for anyone running AWS in production. Here you’ll find articles on new AWS launches, obscure AWS knowledge and the infrastructure behind runbook.cloud.
For my first post, I’d like to talk a little about why I decided to build runbook.cloud.
Prior to starting on runbook.cloud, for many years I ran an AWS consultancy and managed services company in the UK. We were acquired in 2016, but stayed there a further two years. For the entire time I was at the company I was part of the on-call rota, responsible for dealing with any urgent issues with customer infrastructure. No-one likes doing on-call, and I always felt it was important that whilst running the company I continued to be part of it - it demonstrated that I wasn’t asking the team to do anything I wouldn’t myself do, and gave me the best possible insight into the quality of service we were providing to our customers.
The most nerve-racking part of being on call was always that initial moment of looking at the alert. At this point, you have a very brief message - often something as simple as ‘5xx errors above threshold’. You then need to look at the customer’s estate, which was typically 100+ EC2 instances, RDS, Elasticache and a number of assorted other AWS services, and work out what is actually the problem. This typically meant looking through pages of graphs on dashboards we had built. The difficulty in this situation is compounded because problems tend to cascade. You need to find the source of the problems. On a large infrastructure you might have a dozen different alarms triggered, all for valid reasons, but you needed to know which was the cause, and which were the symptoms.
runbook.cloud is the app I wish I’d had when I was on-call. It shows clearly a prioritised list of issues, with clear advice on how to resolve them. Being on-call is never going to be fun, but with runbook.cloud it can be a lot less stressful.