Late Tuesday afternoon, one of our online customers–he’s Scott–posted a transaction using our new web application. Things were running slowly, and the app timed out. The website reported the failure to Scott, and suggested he try again later. He hit the [Back] button and submitted the same transaction again. Same result; he tried again. And again. And again, until he succeeded. Nearly two hours after his first attempt, the transaction got through.
Scott’s been using the app for several weeks, and we’d had some prior contact. Because his usage pattern is unusual, he finds different stress points than most users. One of his questions ended up in the online FAQ, and others have generated discussions with the programmers. Fortunately for us, he thinks like a beta tester, so he’s comfortable helping us solve his problems; this is particularly important because we anticipate other users adopting his usage patterns as they grow comfortable with the system. When Margie checked her voicemail on Wednesday morning, one of the messages was from Scott.
Meantime, I was trying to find ways to identify instances of the problem I discussed a few days ago. We keep a variety of logs and databases which are going help diagnose this bug, but none were designed for identifying instances where the bug occurred. So I was printing stuff out, staring, marking things up, putting things aside. One of the reports–a quick & dirty thing I’d written so Margie could see who was using the application–turned out to be useful: Scott’s repeated transaction (31 instances!) was pretty easy to see. I opened some of Scott’s files and verified that I’d found my example case. Then I:
- Called Subhash, the web programmer, and suggested he take a look at Scott’s transaction string.
- Had a short chat with Margie, and learned she’d heard from Scott. Hadn’t yet returned the call, so I took that one off her hands.
- Wrote a short note to my bosses describing the example case, with a preliminary map of the implications and a promise to follow up.
- Called customer Scott, and described the problem I was chasing. We chatted for several minutes about what he’d seen, and about what I knew.
- Answered the phone: Subhash was calling back. We shared facts and opinions. He has a proposal; it needs to be fleshed out and likely won’t entirely solve the problem.
- Called Scott back to clarify some of the previous discussion, and to follow up on one of Subhash’s questions.
By day’s end, I was exhausted; I took Joan to Carrabba’s….
On Thursday morning I wrote memos:
- A quick summary of the implications of the problem, with some suggestions about how we needed to address them in the short term. The main point was that we need to treat this as several problems–there’s a design flaw, there are network issues which aggravate the design flaw, and we’re suddenly generating errors which need fixing (including refunds).
- An overview–very high level–of the technical issues. The intended audience for this memo was Caroline, the unit supervisor, who prefers to ignore notes she considers difficult. She really needs a sense of the problem, so I wrote this summary mainly for her.
- A somewhat more technical note aimed at the middle managers, including a summary of Subhash’s solution proposal. I included a short critique of the proposal.
- Finally, I added a note describing some potential application enhancements I’d identified by looking at Scott’s work. A better design could save him about half his keystrokes; seems worth doing. Somehow we’d not imagined Scott in our planning process.
Spent the afternoon looking for less obvious instances of the problem. It’s a two-step process: First I made a list of transactions which looked like duplicates (same customer, same transaction type, short time frame), then I verified my work by comparing PDF files prepared for delivery. Looking only at May transactions, I put about 30 pairs (actually, sets) on the original list. The second step reduced my list to 17 instances, all of which need refunds or other account adjustments.
One last memo, on Friday morning, sketched the outstanding issues relating to the mere fact that we’d overcharged a number of customers, proposing some steps for processing the refunds (about $5,500 at this point, much of it due to Scott’s employer), and asking that a clerk be assigned to help me work out the details and pick up what will apparently be a daily burden until we solve the program problem. They volunteered to lend me Sarah–certainly a fine choice.
I spent yesterday afternoon modifying Margie’s “quick & dirty” report to better suit this problem. At the same time, of course, I implemented some improvements for Margie’s purposes.
Sarah and I will cobble together a refund plan. We’ll need to get appropriate approvals, which will drag Finance into the discussion. I’m really looking forward to that.
We (my boss, me, Margie) will begin the process of building a solution to the transaction problem. Details need to be settled, but the following teams will need to be involved:
- Finance, and finance’s programming team.
- Our network people.
- The web coding team.
- The “back end” app vendor’s coding team.
Doesn’t look pretty.