A few years ago, I was working on a project for a UK client who needed to replace or rewrite a legacy inventory management web application written in classic ASP. The problem: no documentation, and complicated, spaghetti source code with many apparently duplicate or redundant ASP files and ASMX web service endpoints.
Which ASP pages were actually in use? We had to find a way to limit our application migration efforts to only those pages which were used by the application. A colleague at the time introduced me to what must be one of the best-kept secrets in the Windows developer world: Microsoft Log Parser.
Using LogParser with IIS Logs
Described as … a powerful, versatile tool that provides universal query access to text-based data such as log files, XML files and CSV files, think of MS Log Parser as AWStats for almost any data source and without the Perl headache, albeit without the nice HTML reports out-of-the-box.
By tailoring the log file queries, not only can LogParser be used to determine which endpoints are in use within a particular application, but can be also used as the basis for Functional or Load test scripts (I will cover this scenario in a separate post).
In order to determine the endpoints in use by the application (typically ASP files, with some ASMX web service endpoints), we configured the IIS web server in our test environment to log extended information about the incoming requests:
We then ran a series of user journeys through the application, covering all the steps which seemed possible, based on what we understood of the application. This generated several hundreds of megabytes of IIS logs.
We then used LogParser to extract the details (here, in tab-separated format) for those requests which resulted in an HTTP 200 response:
$ LogParser -i:IISW3C -o:TSV "SELECT * FROM ex090715.log WHERE sc-status=200" > ex090715.txt
We could then load the results into Excel for manual inspection:
Satisfied that we had valid results from the LogParser query, we could then correlate the cs-uri-stem column data with files on disk: those files on disk with no corresponding entry in the cs-uri-stem column were possible candidates for removal, whilst clearly those files on disk with an entry in the log data were actively being used as part of the application.
The Result – HTTP Endpoints Identified
By using the Microsoft LogParser tool and detailed IIS logging, we were able to identify the vast majority of ASP and web service endpoints used by the application.
(We also devised a way to use LogParser to help generate load test scripts – I will cover this in a future post).
This technique could be extended by processing IIS logs from across all web and application servers, and thereby identifying all active HTTP endpoints within the server estate, as a cheap and cheerful Splunk or LogStash alternative.