Once Upon a Time: a reporting story

Lessons

I’ve mentioned before that we have an ad hoc reporting/data mining tool on our mainframe. We’ve been using the system for about fifteen years, and at any given time there are three or five analysts on the business side who might be called expert users. Another handful use the tool regularly for very specific tasks, and there are always a couple dozen “users” who have access to the program but rarely or never use it. Every eighteen months or so George, our DBA, discovers there’s enough demand to justify teaching another group of users and sets up a class for the five or dozen folks who’ve been designated by their managers.

For several years, now, George has led off the class with a cautionary tale: Once upon a time, George, Randy, and Joel–all of whom were (still are!) believed to be both capable and knowledgeable–came up with significantly different estimates of the impact a major change in departmental procedures. Although our queries were technically similar, the results were so dissimilar that all three estimates were suspect. Circumstances forced us to take this disagreement public, which our superiors found annoying.

George treats this as a lesson in the need to establish standards. There may be other lessons.


Study

Our DBA’s typical involvement in a project begins with “George, we need to know how many records meet these criteria.” George asks questions until he understands the issue, runs some queries, and comes back with some numbers. Those get plugged into equations, and the staffing and budget estimates get populated with “real” data.

Of course, George is rarely the only person running these queries. I may have been involved in the project meetings, and perhaps been asked to provide estimates and analysis. Randy & Chase, who work in other bureaus, bring similar backgrounds and similar skillsets to our projects; often they’ve been in those meetings and may have run queries about the changes as well. (On other projects you might find Lucy paired with Sharon, but their story’s similar.) Our numbers get compared, we puzzle out the differences, and management goes with the numbers which seem best. Whatever “best” means.

Once upon a time, a revised drunk driving law completely changed the legal logic of one of our core activities. While I wasn’t on the project team, I sat with most of the key players and was supporting their efforts in many ways–offering advice, attending meetings, identifying example cases, running tests, assisting the training effort. (It’s fair to say this project defined Second AT.) Eventually it became clear we’d need thousands of example records, so I created a comprehensive list of every relevant record and generated presorted printouts of sample records. In the process I acquired/created a more-or-less complete local database of record summaries pertinent to the change in legislation.

‘Twasn’t an easy problem. The legislation identified four distinct behavior patterns which needed regulation. Three of the patterns were affected by the peculiarities of individual licensing histories; two of these extended existing practice, the third simplified a process which had gotten hopelessly messy. The fourth pattern, while simpler in design than the modified processes, was a completely new concept for the agency and we had little useful summary data for benchmarking purposes. A record might contain more than one of the four patterns–and (under the law) we might or might not care about that overlap. Another complication, for predictive (budgeting) purposes, is the design of our driver databases: While they often contain several years’ history for an individual driver, they are intended as a current snapshot of his or her record. Any history which isn’t relevant to that purpose is regularly purged. If the research question runs back more than two years, this purge might have considerable impact on the study result. So part of the art of querying our databases is constructing plausible reconstructions of that missing history. In this project, that missing data was the precise issue of the change in the law.

We constructed similar queries, as it turned out. Our results were surprisingly different:

  • George estimated the initial impact at 80,000 drivers.
  • Randy estimated the initial impact at 120,000 drivers.
  • I estimated the initial impact at 150,000 drivers.

Why the differences? Filters, mostly.

  • George badly underestimated the impact of the (missing) historical data. His estimates, extrapolated from a seven-year sample, grossly miscalculated some of the context.
  • Randy had the best grasp of the historical data–he’d been tracking some of the relevant information over time, for other purposes–but he wasn’t in a position to estimate the new (fourth) pattern’s impact. For the calculations he was able to make, he was able to build an acceptable estimate of a forty year window, based on ten years’ data. Quite frankly, this is a really impressive accomplishment, though I think only George and I appreciated it.
  • The bulk of my difference with Randy resulted from my ability to generate a passable estimate of the fourth pattern. This is because my bureau agreed to pay to run an expensive query on the mainframe’s database. In places where our estimates could be directly compared, Randy’s numbers were consistently a bit higher than mine.

George’s story ends here. Mine doesn’t.


Politics

All offices have internal politics. Our operation is embedded in “real” politics, which sometimes generates chaos. While the legislature’s (usually) not deliberately disrupting our budgets, they’re also not (usually) very concerned about those disruptions.

One reason I was making estimates was that our bureau would be most directly affected by the changes, and our director wanted “independent” numbers for her budgeting purposes. Since the project manager–Randy’s boss, as it happens–had been heavily involved in the legislative negotiations, my bureau’s management considered her to be “married” to the legislative staff’s budget estimates, which everyone considered too low. (We all have our own pressures.) Our manager also had a raft of related questions which she could more conveniently run past me than Randy, and (as noted above) I was explicitly tasked to identify sample records for the test team.

As I was completing my analysis of my data, the PM authorized a mailing based on an 80,000 estimate. I hit “reply to all” and opined that the estimate was low, and supported the opinion with “better” data. I also pointed out that there were three estimates available and suggested that someone ought to figure out why we differed before committing to any of them. The fallout wasn’t pretty:

  • Almost immediately I heard from the PM’s Chief of Staff, who’d been my boss for several years. Ben was sounding me out, not trying to persuade. We all have our own pressures.
  • Janet wasn’t much concerned about the disagreement, but she made it real clear she wasn’t pleased with my “reply to all” response. She’s not one to take arguments public.
  • My bureau’s manager thanked me for defending her interests. That she was Janet’s boss at this time perhaps explains why I didn’t get drawn and quartered over this dispute.
  • Randy had a similar chat with his boss/the PM.
  • The PM–an old friend, actually, though we’ve a long history of disagreeing in public–called and politely thanked me for my input. She adjusted her numbers. Slightly.

Aftermath

After the dust settled, George, Randy, and I got together and wrote up an after-action report. Everyone’s reputation survived the experience.

Who got it right? None of us, really, though Randy was in the right range on his data and I did well on the new programming. The programmers figured out that we’d missed a critical parameter, which skewed the numbers in an unexpected direction.

My database had an ongoing life; we were running queries on it for another year. It may have been expensive to create, but we got excellent milage out of it.