Gracenotes: CDDB

I now have 5,801 “songs” in my iTunes database–a number which increases, by the way.  Of my hundreds of CDs, the system didn’t know two rather obscure privately-issued disks and seemed not to know Loggins & Messina’s greatest hits album (had to be a glitch).  It also had a couple albums which were totally mis-labelled, in both cases for fairly obvious reasons.  This is fine performance at a high level, and I really appreciate it.  I suppose you could add all those tracks without giving much thought to CDDB, but it’s pretty hard to ignore that the data comes from somewhere.  Some notes follow.

I was aware of CDDB already, as I’d run a number of audio apps through their paces when I bought my last PC and, well, you couldn’t avoid noticing that the apps all hit on databases when you played a CD.  It was a couple years ago when I did this, though; frankly, I wasn’t particularly impressed by the holes in the data, and the erratic quality of the data which was available at the time.  On the other hand, I’ve mostly used the PC’s audio apps for listening to ballgames, so identifying ripped CDs wasn’t exactly an issue.

Things have definitely improved. At this point, Gracenote’s database (and the service it supports) are clearly “good enough” for a lot of purposes:  For almost any CD, CDDB knows the name of the recording, the name of each track on the recording, and the name of the artist.  That they were able to do this in a relatively short time with volunteer labor is a significant accomplishment, but that sort of story is becoming fairly common the Web so we’ll not explore it further today.

Question:  Is “good enough” really good enough?  The answer probably depends on what you were hoping for.  These are things I find irritating, to one degree or another:

  • Inconsistent data recording.
  • Inadequate design for classical music collectors.
  • Inadequate implementation in iTunes (and certainly elsewhere).

Inconsistent Data Recording

Composer and Year information is missing (or consistently mis-recorded) at least as much as it’s available, and Genre information can best be described as casually recorded (just for instance, someone chose to mark Time-Life’s Classic Rock series as “Alternative/Punk.”  Yeah. Sure.)  “Compilation” is clearly a mystery to the usual volunteer, as is “Disk 1 of 3.”

The problem really isn’t the volunteers, it’s that their interests and needs differ from mine.  Partly, too, it’s the inconsistent (or unreadable) information provided on packaging, where design often trumps information.  Nonetheless, it’s frustrating to find I need to fix about half the CDs I import, even though someone’s already gone to the trouble to record the basic information.  Some instances deserve mention:

  • Is the band which recorded “Groovin'” called Rascals, The Rascals, Young Rascals, or The Young Rascals?  I know how this happened, but I find them listed in the database with all four versions–without apparent regard to what they called themselves at any particular time.  At least I haven’t found them as “Rascals, The”–
  • Was “A Beautiful Morning” written by “Felix Cavaliere and Eddie Brigati”, “Felix Cavaliere & Edward Brigati”, “Cavaliere-Brigati”, “F.Cavaliere/E.Brigati”, “Cavaliere, Felix; Brigati, Eddie”, or–well, you get the idea.  iTunes handles searches for this pretty well, but….
  • Since all of my Rascals recordings are on compilations or anthologies, should the “Year” be the year “Good Lovin'” was recorded (or originally released, which may differ), the year the compilation/anthology was released on vinyl, or the year the compilation was released on CD?  We have some serious disagreements.

Gracenote knows these are issues, by the way, and has opinions about how this information should be recorded.  What they don’t have is a way force volunteers to follow the guidelines, and only a weak mechanism for checking the data.  They’re obviously depending on people like me to fill in the gaps; in time, that will likely work out.  At least that’s Gracenote’s design assumption.

More irritating than the omissions, though, are the outright errors.  I’m not talking about typographical mistakes–I’m talking about mislabelled tracks, misleading categorization, fields used very differently than designed, and other forms of bad data.  If you really don’t care what it says on your MP3 player, keep your blankety-blank hands out of the data.  We don’t need that kind of help.  Thanks!

Inadequate Design for Classical Music Collectors

The CDDB design assumes that all music comes on albums (a description, largely abandoned these days, which fit 78 rpm collections far better than LP disks), and that the albums are collections of largely independent songs.  This design works well for most popular releases, and works all right for single recordings, but is a very poor model for your typical classical composition.

  • This design doesn’t work well if the collection tells a story (think “Tommy”).
  • Neither does it work well if the collection explores a theme (“Bookends” fits this in one sense; Bach’s “A Musical Offering” fits in another).
  • It’s a really poor design for a work with an overarching structure which contains several movements (Vivaldi’s “Four Seasons” is an excellent example).
  • The design makes no effort whatever to accomodate your quite typical classical album which contains three or four pieces of three or four movements apiece.  In no useful way does the “collection of songs” metaphor fit this example.

And then there’s the issue of performers.  The guidelines say:  “It’s sometimes tricky to figure out the principle artist on classical albums that often list the composer, conductor, ensemble, and/or soloist(s) on the front cover. So here’s a tip: look on the spine of the CD, as this is where the publisher had to make a decision as to who to list as the primary artist. Second tip: The artist should be a performer on the recording, so you should rarely choose the composer as the artist.”  Um, folks, that really doesn’t cut it.  If I wanted, for instance, to list all the tracks in my collection where David Monroe plays recorder, I perhaps wouldn’t want the tracks where he’s “just” the conductor of the consort.  At this point, we don’t have any mechanism for that sort of discrimination; indeed, we barely have any means to record the bare fact that Monroe’s a participant unless he’s the star participant.  Perhaps someday.

Inadequate Implementation in iTunes (and certainly elsewhere)

Apple’s chosen to build an implementation which reveals just some of the CDDB fields, and doesn’t take full advantage of CDDB’s ability to differentiate between albums and individual tracks.  It’s particularly annoying that they don’t reveal Gracenote’s entire categorization scheme; instead we get the dumbed-down top level which tosses all “Classical” music into one bin, and has no really appropriate container for my bluegrass, and my bluegrass gospel.

A couple final notes

  • In my experience, owners of jazz recordings appear to care most about getting this right.  Classic rock recordings are generally logged poorly; baroque and classical music is widely variable (and likely to ignore the guidelines).  I no longer collect currently-popular music, for the most part, so I won’t venture to comment on that; I’d be interested in other folks’ observations, though.
  • It’s really quite surprising to me how often the entries on multiple-disk albums were entered by more than one person.

This is a valuable service, and useful as it stands; I like it and can recommend it.  There’s room for improvement….