BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

Surviving the Ups and Downs of Startup Life: A Roller Coaster Launch Day

This article is more than 7 years old.

I hate roller coasters. I mean, I really hate them. I do everything I can to avoid them. In fact, I get nauseous just thinking of all the ups and downs — and the unexpected. Drama is not my idea of fun.

In startup land there are times you simply have no choice. You can do only so much to ensure a smooth ride. And then the unexpected hits. Without warning.

This past Tuesday was one of those days. We (that is the Ziggeo team) were so excited for a major release of our video player and recorder template themes. We'd  been working months and months on them, and finally, the day had come to release them.

We were thrilled to learn that our release was being featured on Product Hunt (a big deal for any company). We'd also perfectly timed the release of our newsletter, blog post, social media campaigns -- all to drive people to our site on that one crucial day. All the boxes were checked, and nothing could go wrong.

Until it did. Just after noon on launch day, we got wind that parts of Amazon Web Service (AWS) that we rely on — S3 in US-EAST-1 — went down. That may not mean much to you, but in our world it meant our entire system would be brought down for the first time since the day we launched Ziggeo’s API. Their S3 service was supposed to be stable to four nines, meaning only 52 minutes per year of outage were expected. But yesterday, with its S3 outage, the unexpected happened. And it happened on the exact day we had planned for our major release: the very day tons of traffic would be brought to our site.

There was absolutely nothing we could do. We tried to reconfigure our system, but given our dependency on AWS, we simply had to wait for AWS to make the fix. To make matters worse, the AWS fail brought Zendesk down, the service we use to communicate with our customers.

We needed to remain calm and keep our customers posted via Twitter. We weren't the only site affected: Slack, Quora, Business Insider, Expedia, Giphy, GitHub, Kickstarter, Mailchimp, Medium, Twilio and other major sites that depended on that region – all impacted. We just hoped our customers would understand given that most of them probably rely on AWS as well. Still, for three hours (the time Ziggeo was affected) we monitored the situation and relayed to our customers the small bits of information that came out of AWS.

And then, once we noticed that parts of AWS were back up and running, we scrambled to reboot our system, test all parts and ensure nothing had been affected (it wasn't). The following morning we sent an email to all our customers, apologizing for being down and explaining what exactly happened.  

Here are five lessons we learned along the way:

  1. Keep customers up to date. They appreciate being kept in the loop no matter what.
  1. Don't freak out. Do what you can to quickly make any fixes and mitigate any loss. But know there will be times when events are out of your control. Just try to stay calm. As much as possible.
  1. Once you're able, make sure your team scrambles to get everything back up and running as soon as possible. Test, test, and test again.
  1.  Write a post mortem. Post it on your blog, tweet, and send customers an email afterwards explaining what happened. Include how you'll be working to prevent such issues in the future. Once we sent our email to customers, we heard from a number of them who thanked us for explaining what happened.
  1.  Gradually re-architect your system for even more redundancy to reduce the likelihood of future downtime.

I'll never enjoy roller coaster rides. That's for sure. My trick: hold on tight, realize bumps are par for the course — and try not to toss those cookies along the way.

 

Follow me on Twitter or LinkedInCheck out my website