Site Reliabililty at Scale – Discussion Roundup

There have been several useful discussion threads on the LinkedIn Site Reliability at Scale group (http://www.linkedin.com/groups?home=&gid=4200099) recently:

The Shard, London

Is extreme high availability a bad thing?

  • …having a higher degree of PER ELEMENT failure, while allowing an architecture to get you to extremely high overall reliability, is one of the more transformative features of many cloud options…
  • Several comments about how reliability is less immediately appealing to businesses  than new features, but of course reliability is the slow-burning coal, whereas new features are often just so much kindling (and therefore “burn out” very quickly).

What are you using for server and network monitoring?

  • Zabbix and Nagios are the usual suspects, with Splunk in the mix too (obviously not a direct comparison), with Zenoss making waves. Little input from anyone running SCOM for Windows; presumably these folks are using Zabbix or Nagios(?!).

Literature on website scalability

Some useful print and online resources for building reliable websites, including:

Join the discussion...

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s