Stress Testing: our web app, design issues, and the real world

Context: Mostly because our new web application needs to create PDFs for document delivery, the internal application also creates PDFs. The print routine prints the PDFs, not its component documents.

Last Tuesday, one of our customers requested what turned out to be slightly more than 6,000 pages of documentation. The job was committed at 9:00 am, at which time the system began retrieving pages from the jukebox. Three hours later, the 6,000 TIFFs had been retrieved and the job was queued for printing. Around 4:00 pm, the queue reached the large job and the application began packaging the TIFFs into a PDF. This took a bit over three more hours to accomplish. The app tossed the PDF at a printer, which promptly ran out of paper.

We finally completed the job during Wednesday’s lunch hour.

The PDF weighed in at a hefty 1.56 GB. This is six times larger than the previous record for this system. It’s fair to say we were impressed.


When we were planning this project, we met with a mixed IT/vendor team to discuss the potential network impacts. At the time, the state’s web infrastructure was fairly immature, and it took only a few minutes for the folks across the table to become alarmed. If our lowest traffic estimates had any basis in reality, our application would certainly stress, and perhaps break, the connection between our servers and IBM’s web server farm in Boulder. They were particularly concerned that we were unable (they probably thought “unwilling”) to define a maximum file size. We had excellent estimates for typical traffic, but predicting peak traffic was (and still is) difficult. In the end, we agreed to cap the file size the web app would deliver, and make provisions for giving large jobs alternative handling.

It’s a couple years later and our web interface is still not publicly available, but the state’s data lines have been beefed up. Nonetheless, it’s pretty clear that we couldn’t have delivered last week’s large file via the web.  The need to accomodate this problem is the main cause of the long implementation delay….


The coding teams reached an agreement about a revised delivery mechanism in October, worked together to implement it, and have finally reached the point where testing is possible.  (I may have oversimplified this process. Just a bit.) I’ve done some casual checking, and am convinced that things are this close to being ready.

So Friday I took my test team to Tony’s testing center and began working through the test script. First we worked through the document retrieval portion of the system:

  • June noticed and documented a serious problem which appears to be a minor coding error.
  • We identified some less pressing concerns.
  • Mostly things worked. This is good!

Today we returned, and began testing the document creation pieces. We discovered some oddities, but things were going fairly well when the development system failed around 11:00 am. (Murph just can’t leave things be.) We’ll try again, tomorrow….