Fault tolerance, anomaly detection, and anticipation patterns by Jon Allspaw at QConLondon 2012

Jon Allspaw (@allspaw) from Etsy talked about the role that Anomaly Detection, Fault Tolerance and Anticipation play in producing highly scalable software systems (Fault tolerance, anomaly detection, and anticipation patterns, slides [PDF, 5MB]).

As head of technical operations at Etsy, whose web traffic is pretty substantial, Jon focused on resilience in software systems: what it is, and how to achieve it.

QConLondon 2012 blog posts
See all QConLondon 2012 blog posts…

Continue reading Fault tolerance, anomaly detection, and anticipation patterns by Jon Allspaw at QConLondon 2012

Five Interview Questions for Hiring DevOps Staff

Over the past seven or eight years I have developed a list of five key interview questions for recruiting staff to software development teams. These five questions have come to stand out as being highly indicative of the candidate’s aptitude for approaching software in [what is now called] a “DevOps” manner, namely, seeing software as the running, evolving system in the Production environment.

Continue reading Five Interview Questions for Hiring DevOps Staff

Site Reliabililty at Scale – Discussion Roundup

There have been several useful discussion threads on the LinkedIn Site Reliability at Scale group (http://www.linkedin.com/groups?home=&gid=4200099) recently:

Continue reading Site Reliabililty at Scale – Discussion Roundup

Assert-based Error Reporting in Delphi

[This is a very old article I wrote back in 2002 when I worked for a company which built MRI scanners and was subsequently bought by Oxford Instruments. The driver for this was “…Until Delphi acquires native functions equivalent to the C [__LINE__ and __FILE__] macros, … the need for this Assert-based framework … will remain” The need to trace errors to a specific class and line number, especially in production code, has only become stronger since then.]

Summary

This note describes a simple but flexible error-reporting and tracing framework for Delphi 4 and above based on Assert, which provides the unit name and line number at which errors were trapped and traces made.

Details

Background

Under the Delphi Language there is no simple way of replicating the C/C’++ macros FILE and _LINE_ to obtain the unit name and line number of memory address at runtime. However, in his paper “Non-Obvious Debugging Techniques” Brian Long points out that the Delphi compiler provides both unit name and line number during a call to Assert, and describes how assertions can be exploited to provide detailed execution tracing.

The framework described here extends this idea to allow flexibility in the processing of assertions. Assertion processing can be switched on and off at runtime; arbitrary filtering can be applied to any assertion; and both execution tracing and ‘standard’ assertion behaviour (i.e. raising an exception) can be effected. Assertions can therefore be left enabled in production code, at the expense of a slightly larger binary executable.

Continue reading Assert-based Error Reporting in Delphi

UK Scale Camp 2010 – Braindump

I’ve just returned from UK Scale Camp 2010 (@scalecampuk), organised by The Guardian (and the indefatigable Michael Brunton-Spall, ). Here are some notes:

Overview

I liked the “unconference” format (no formal programme; attendees vote for their favourite sessions in advance), and ended up in four of the many sessions:

  • DevOps on Windows
  • Log Analysis for Search Results
  • DB Changes without Downtime
  • Handling Errors at Scale