- “…having a higher degree of PER ELEMENT failure, while allowing an architecture to get you to extremely high overall reliability, is one of the more transformative features of many cloud options…“
- Several comments about how reliability is less immediately appealing to businesses than new features, but of course reliability is the slow-burning coal, whereas new features are often just so much kindling (and therefore “burn out” very quickly).
- Zabbix and Nagios are the usual suspects, with Splunk in the mix too (obviously not a direct comparison), with Zenoss making waves. Little input from anyone running SCOM for Windows; presumably these folks are using Zabbix or Nagios(?!).
Some useful print and online resources for building reliable websites, including:
- Seven Databases in Seven Weeks
- High Performance Web Sites[although this probably needs updating now to include WebSockets, Node.js, etc.]
- Gigaspaces XAP architecture overview
- Experience with some Principles for Building an Internet-Scale Reliable System(Akamai – PDF, 160 Kb)