In this article, Mark Petrie, CEO of Kumulos, examines why APIs have outages, what the cost of such outages can be and therefore the importance for you, as a mobile app developer, to monitor all of the API Endpoints your app depends on to function, in order to reduce the cost of an API outage on your mobile app.
myAWS.availability() != myAPI.availability()
The reliability and availability (uptime) of cloud computing providers such as Amazon Web Services or Microsoft Azure has improved massively over the last few years. However, most modern websites, services and APIs that we choose to host on them, are built on a complex chain of dependencies. Every now and again, a link in that chain will break. As soon as that happens, any mobile apps that depend on these APIs to function are going to incur costs. The cost of wasted marketing spend on user acquisition campaigns, the cost of lost subscription revenue opportunities and the cost of putting right the reputational damage from all those one-star “Crashes when it opens” reviews in the app stores. Therefore, while you may not have control over all of the services and APIs that your mobile app depends on, it is essential that you, as a mobile app developer, monitor their availability, so you can pro-actively take action to mitigate the impact of an outage on your mobile app rather than walking ignorant into the bad news on Monday morning.
Why do APIs have outages?
The concept of code reusability in software development is nothing new. However, where this used to be constrained to the confines of a module, product or at best company, the open-source movement and availability of package managers such as NPM now means that almost all new software reuses code written and maintained by others. This is not a bad thing, it is very the reason developers can deliver rich user experiences in relatively short periods of time. However, it does mean that the quality of your website, service or API is now, to some extent, depends on the quality of the code that you include. Assuming the person who wrote that code has adopted the same approach as you, then you are also now depend on the quality of the code that they include… and so on.
Of course, this is where software testing comes in – to verify quality before release. But what you may not realise, is that this must be repeated for every release. Why? Because all of the code that you included changes… regularly!
By making it easy to update the code being shared, they also make it easy to update the code you are including. Again, this is not a bad thing, especially when it comes to patching security vulnerabilities. However, installing unattended updates every night or week carries with it the risk that something, somewhere high up in the complicated chain of dependencies breaks… and along with it… everything downstream, including your website, service or API!
This was perfectly illustrated in the infamous “left-pad incident” of 2016 when a developer unpublished 11 lines of code from NPM and “broke half the Internet”. These 11 lines of code were contained in a module called left-pad that had been downloaded 2.5 million times in one month alone. The removal of left-pad first caused problems with other very popular open source packages such as React, Node and Babel and then resulted in problems for the numerous websites, services and APIs built upon these technologies. Most of those impacted had never even heard of left-pad let alone knew that their software was in some way dependent upon the availability of those 11 lines of code!
So this is why APIs have outages! Not even to mention all of the other regular problems that can happen – hard disks crash, people forget to update SSL Certificates (this old chestnut alone was responsible for all Occulus Rift VR Headsets shutting down and mobile operators O2 and Softbank losing 4G data services) and every once in a while… even trusted cloud computing providers themselves can have problems – in March 2018, AWS Eastern Region suffered a power outage taking down Atlassian JIRA, Slack and SMS provider Twilio!
All in all, it is inevitable that at some point, one of the APIs your mobile app depends on to function will have an outage.
Measuring the cost of an API outage
The cost of any one outage will of course depend on the timing, duration and nature of the business(es) it impacts. A recent report by The Ponemon Institute estimates that the average Global 5000 company will incur costs of over $15m USD resulting in a certificate outage (not including any regulatory fines for applicable businesses).
The exact timing of the Facebook/Messenger/WhatsApp/Instagram outage in March 2019 is unclear, but it is accepted that these services were either down or suffering from degraded performance for almost 2 days. According to Facebook’s own earnings announcement, this would put the cost of such an outage to them in excess of $300m USD.
While Facebook themselves may be able to sustain such (expected) unexpected costs, for those that rely on the platform for advertising and custom, the impact is more severe. For some small businesses the cost of that outage was five figures and much more serious.
Estimating the cost of an API outage on your app
Estimating the potential cost of an API outage for your app is actually quite straightforward if you know your Cost-per-Install (CPI), retention and conversion percentages through your onboarding funnel.
For example, your app retains 25% of installs as Monthly Active Users (MAU) and then converts 20% of these into subscribers. If your target for the month is 5000 new subscribers, you know that you will need 100,000 installs each month (approx. 3300 per day). At an average Cost-per-Install (CPI) across all your different media sources of $2 USD then your will need to spend around $200k USD each month on user acquisition to meet your target of 5000 subscribers.
However, lets say that one of the APIs your app uses changes unexpectedly over the course of one weekend breaking the signup flow in your app. If it takes you three days to diagnose the problem, update your app and publish the update to the stores, the 10,000 potential subscribers who download the app during this time will be unable to signup and will, in all likelihood, never open your app again.
At a Cost-per-Install of $2 USD, you will have wasted $20k USD on user acquisition alone. Filtering the impact of this outage through your onboarding funnel, you will be down 500 subscribers, which will have a knock-on impact on your subscription revenue. Not to mention the impact that negative reviews will have on increasing your CPI next month!
Reduce the cost of an API outage on your mobile app
Friday afternoon, everyone is winding down for the weekend with thoughts of beers, pizzas and Netflix… and then the phone rings… the app is crashing on launch. And just like that, everyone’s weekend just got a lot less fun. This is not an unusual story.
While the ultimate cause of the crash may, as we have discussed, be nothing to do with your code or even the app itself, the fact that your client or customer, marketing team or even end user finds out about the crash before you does not reflect well on you as a mobile app developer. Therefore, it is essential that you proactively monitor the availability of the API endpoints that your app depends on to function so you are alerted as soon as a problem occurs.
There are a number of different paid and free tools on the market that do this. However, when choosing a tool, pick one that not only checks the availability of the API endpoint, but also inspects the shape of the payload returned. The API does not need to be down in order to cause a problem for your app – depending how your app has been built, even a minor change in the payload shape can cause problems.
Also, make sure that the tool you choose has the option to alert you, for example via Slack, as soon as a problem occurs so you can start to act. The very first thing you need to do is communicate. Let your client or customer, marketing team and end users know there is a problem and that you are investigating. Everyone understands that outages can happen, this will reassure them that your processes are robust and that you are in control.
Next, stop any user acquisition campaigns. There is no point wasting money driving traffic to the stores when users who download the app may have a sub-optimal experience. You can easily turn them back on once the problem is resolved.
Monitor and pro-actively respond to reviews on the app stores. Again, end users understand that outages can happen. How you respond will reassure them and minimize the impact of negative reviews on downloads and therefore your cost-per-install.
Finally, monitor Daily Active Users (DAU) and funnel conversion through the key user journeys in your app. This is actually always good practice as subtle changes, for example in time taken to complete each step, can have a big impact on conversion. When everyone is fire-fighting during an outage, it is essential you can quantify where and what impact this is having throughout your entire app to ensure there are no side-effects after it is resolved.
In summary, the way modern software is built, means that outages can and will happen. As a mobile app developer, you need to be prepared for this and ensure that you know of the problem as soon as it happens. That way, you can be proactive and take all necessary action to reduce to the cost of an API outage on your mobile app.