The take-home points were emphasised in the last 15 mins when James Lewis ran through details of the AWS outage in April this year which brought down video streaming service Netflix:
- The cloud can be incredibly flexible and rapid for provisioning
- The cloud hides much of the complexity of traditional data centre configuration
- However, we still need to design for failure (as we always have)
The root-cause analysis by Amazon makes for interesting reading (as it goes into some detail about how their cloud services work). Ultimately, though, they make the point that:
if your systems failed in the Amazon cloud this week, it wasn’t Amazon’s fault. You either deemed an outage of this nature an acceptable risk or you failed to design for Amazon’s cloud computing model.
This is why software developers and architects still need to understand failure modes, resiliency, redundancy and so on, and design their software to expect failure and deal with it.
“The Cloud” does not absolve software developers from planning for failures, and four nines (99.99%) uptime is still hard to achieve.