Site Reliabililty at Scale – Discussion Roundup

There have been several useful discussion threads on the LinkedIn Site Reliability at Scale group ( recently:

The Shard, London

Is extreme high availability a bad thing?

  • …having a higher degree of PER ELEMENT failure, while allowing an architecture to get you to extremely high overall reliability, is one of the more transformative features of many cloud options…
  • Several comments about how reliability is less immediately appealing to businesses  than new features, but of course reliability is the slow-burning coal, whereas new features are often just so much kindling (and therefore “burn out” very quickly).

What are you using for server and network monitoring?

  • Zabbix and Nagios are the usual suspects, with Splunk in the mix too (obviously not a direct comparison), with Zenoss making waves. Little input from anyone running SCOM for Windows; presumably these folks are using Zabbix or Nagios(?!).

Literature on website scalability

Some useful print and online resources for building reliable websites, including:

Join the discussion...

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.