Profiles: you can’t expect a machine to do this well

Bruce Schneier [link no longer available] writes about using data mining to profile terrorists in today’s Newsday.

While it’s no longer my job, I have some relevant experience.


Like all states, we regulate drivers. The objective: Preventing crashes. The question: Which drivers need (or will benefit from) intervention? We have records for several million drivers. At any given time, a significant portion of those have one ticket. A much smaller number have several tickets. What’s more, all of the following are true:

  • All drivers have careless moments.
  • Drivers with “bad records” are far more likely to crash than drivers with one conviction.
  • In the overall scheme of things, crashes are rare events.
  • Most drivers with several tickets are quite young.
  • Many thousands of drivers have a history of drinking (or other substance abuse) and driving.
  • A few thousand non-drinking drivers have very bad driving records.
  • A few thousand drivers have a history of crashes.
  • Some violations are more dangerous than others.
  • I haven’t even touched on medical problems. It’s a can of worms….

Now: Build a system on that foundation. Which drivers need attention? How do you identify them? What sort of intervention is appropriate? How do you staff the agency? Where does it fit in the organization? How much of this can you do with a computer program?

Your answers to these questions have cost implications, and will reflect where you see the biggest risks. The mix, in all states, changes every few years. But the best anyone hopes to achieve is mitigation; this is not a problem with a true solution.

Mistakes are certain, and some of the mistakes make the front page. This can be embarrassing to the agency. More important, it tends to be some family’s tragedy.


Identifying terrorists by their behavior is a similar problem, and unlikely to work better than our driver intervention efforts. We will not win them all.


Thanks to Joel Spolsky for the Schneier link–and a host of others.