<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>a dabbler's journal &#187; POPfile</title>
	<atom:link href="http://dabblersjournal.com/category/computing/popfile/feed/" rel="self" type="application/rss+xml" />
	<link>http://dabblersjournal.com</link>
	<description>prone to enthusiasms....</description>
	<lastBuildDate>Mon, 19 Jul 2010 04:30:34 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>PopFile occasional report</title>
		<link>http://dabblersjournal.com/2004/11/22/popfile-occasional-report/</link>
		<comments>http://dabblersjournal.com/2004/11/22/popfile-occasional-report/#comments</comments>
		<pubDate>Mon, 22 Nov 2004 16:12:17 +0000</pubDate>
		<dc:creator>dabbler</dc:creator>
				<category><![CDATA[Computing]]></category>
		<category><![CDATA[Joel]]></category>
		<category><![CDATA[POPfile]]></category>
		<category><![CDATA[mail filtering]]></category>
		<category><![CDATA[spam]]></category>

		<guid isPermaLink="false">http://dabblersjournal.com/2004/11/22/popfile-occasional-report/</guid>
		<description><![CDATA[<p>On the whole, this is excellent performance, with some minor (and predictable) blind spots due to peculiarities that are as much mine as the program's.&#160; Except for the lack of a good loader for Apple systems, I can heartily recommend the program; the installation issues appear to be unique to the Mac platform, and shouldn't trouble Windows or Linux users.&#160; Prospective users shouldn't expect perfection, and some effort is required to train PopFile about your mail system.&#160; But it's automatic, reliable, and quite impressive.</p>]]></description>
			<content:encoded><![CDATA[<p>Readers will recall that I've been using John Graham-Cummings' <a href="http://www.getpopfile.org/">PopFile</a> as a spam filter/mail sorter for over a year.&nbsp; Time, methinks, for another update; I last mentioned the program in July.&nbsp; I covered the <a href="http://dabblersjournal.com/2003/11/16/popfile-sorting-the-mail/#more-61">important background information</a> about a year ago, and shan't repeat it today; you may also want to check the PopFile cross-references below to see what I've said before.</p>

<p>I'm still using version 0.20.1, which puts me a couple editions behind.&nbsp; Since I'm satisfied with my version's performance and don't want to fight my way through the Mac upgrade process I'll likely stay here for a while; John and his team will need to add something compelling for me to change.&nbsp; (A proper Macintosh install would be helpful....)&nbsp;</p>

<p>This version's a little slow on my machine but not to the point it bothers me; your mileage may well vary.</p>

<hr />

<p>On to specific results, again organized as I've done in past entries:</p>

<p>The test period ended November 11, 2004, at 7,952 messages.</p>

<ul>
<li>93 (1.2%) were sent to the wrong bucket.
<ul>
<li>(Therefore) 98.8% were sent to the right bucket.
</li>
<li>This is my first report which didn't include significant training, so 99% looks like the "norm" for my system.&nbsp; One way to read this stat is that I decide to reclassify about one message per day.&nbsp; <em>Better than writing rules....</em>

</li>
</ul>
</li>
<li>3,219 (40.5%) were <strong>spam</strong>.&nbsp; (This is a decrease from the previous 46.6%, which would seem to merit comment.&nbsp; Not sure what that comment should be, though.)
</li>
<li>There are areas where the app has, well, issues:
<ul>
<li>431 messages were <strong>auction-related</strong>, with 10 false positives and 3 false negatives.&nbsp; (As you might surmise, I'm again active on eBay.)&nbsp; There's enough noise in auction e-mail that some errors are inevitable.&nbsp; <em>PopFile is very good, though, at spotting eBay and PayPal phishing messages.</em>

</li>
<li>The sorter has significant problems getting my <strong>mailing lists</strong> right (407 messages/14 false +/8 false -), mostly because they cover a wide range of territory.
<ul>
<li>On the other hand, last time there were <em>48</em> false positives; it's learning....
</li>
</ul>
</li>
<li>Vendor mail (118/8 false +/9 false -) is another bucket with some problems.&nbsp; Again that's likely because I catch a number of types of messages there.
</li>

<li>I've pretty much abandoned the effort to get the Change Detection mail into the right boxes, and am effectively treating the whole set as one mailbox.&nbsp; <em>It's more trouble than it's worth, I've apparently decided; the app's just refusing to notice how those emails differ.&nbsp; Since these are mainly baseball-related sites, the issue's not currently important.&nbsp; Next spring I may try something.</em>
</li>
</ul>
</li>
</ul>

<hr />

<p>On the whole, this is excellent performance, with some minor (and predictable) blind spots due to peculiarities that are as much mine as the program's.&nbsp; Except for the lack of a good loader for Apple systems, I can heartily recommend the program; the installation issues appear to be unique to the Mac platform, and shouldn't trouble Windows or Linux users.&nbsp; Prospective users shouldn't expect perfection, and some effort is required to train PopFile about your mail system.&nbsp; But it's automatic, reliable, and quite impressive.</p>]]></content:encoded>
			<wfw:commentRss>http://dabblersjournal.com/2004/11/22/popfile-occasional-report/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Living with POPFile</title>
		<link>http://dabblersjournal.com/2004/07/03/living-with-popfile/</link>
		<comments>http://dabblersjournal.com/2004/07/03/living-with-popfile/#comments</comments>
		<pubDate>Sat, 03 Jul 2004 18:24:49 +0000</pubDate>
		<dc:creator>dabbler</dc:creator>
				<category><![CDATA[Computing]]></category>
		<category><![CDATA[POPfile]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[Stories]]></category>

		<guid isPermaLink="false">http://dabblersjournal.com/2004/07/03/living-with-popfile/</guid>
		<description><![CDATA[Time, I think, for a POPFile update.&#160; It's been a bit over three months, and over seven thousand messages, since I last discussed the program.&#160; Quickly reviewed:&#160; I started using the program in the wake of last August's spam (virus) epidemic.&#160; Right from the start I've used PF as a mail sorting program, not just [...]]]></description>
			<content:encoded><![CDATA[<p>Time, I think, for a <a href="http://popfile.sourceforge.net/">POPFile</a> update.&nbsp; It's been a bit over three months, and over seven thousand messages, since I <a href="http://dabblersjournal.com/2004/03/26/popfile-on-powerbook/">last discussed the program</a>.&nbsp; Quickly reviewed:&nbsp; I started using the program in the wake of last August's spam (virus) epidemic.&nbsp; Right from the start I've used PF as a mail sorting program, not just a spam filter; basically, I replaced a few hundred rules with a couple dozen PF buckets.&nbsp; POPFile's very good, but not perfect, at the task; complications include categories which are quite similar, and categories which are catch-alls.&nbsp; Creative spam and virus authors are likewise problematical.&nbsp; Despite these confusions, I'm very satisfied--much more than I anticipated--with the program.&nbsp; <em>Now, if they'd just simplify the installation routine for Mac users.</em></p>

<p>Here's a summary of the last three months usage, in the format I've used on prior reports:</p>
<hr />
<p>The test period ended July 3, 2004, at 7,292 messages.</p>
<ul>
<li>168 (2.3 %) were sent to the wrong bucket.
<ul>
<li>(Therefore) 97.7 % were sent to the <em>right</em> bucket.
</li>
<li><em>This percentage took a significant hit at the start of the baseball season, when a bunch of email sources came back to life.</em>
</li>
</ul>

</li>
<li>3,397 (46.6%) were <strong>spam</strong>.&nbsp; (This is a significant increase, I'd say, from the previous 41.0%.)
<ul>
<li>A handful of these are from legitimate e-mail lists whose owners make it difficult to unsubscribe, but the impact is minimal.
&nbsp;
</li>
</ul>
</li>
<li>Only 11 messages were <strong>auction</strong>-related;&nbsp;3 of these were false negatives and 1 was a false positive.&nbsp;&nbsp;I seem to have stopped hanging around eBay, at least for now.

&nbsp;
</li>
<li>The <strong>Vendor</strong> (100 messages/16 false +/11 false -) and <strong>Mailing List</strong> (402/48 f+/8 f-) categories, both of which are catch-alls, seem to show real improvement, though this is still a significant source of error.&nbsp; The problem continues to be that "well-designed" spam looks superficially like these categories.
</li>
<li>
The problem I reported with e-mails from <a href="http://www.changedetection.com/">Change Detection</a>&nbsp;still exists and remains annoying, but has improved; basically, PF sees several classes of messages as too similar to differentiate.&nbsp; It's pretty clear to me that the algorithm isn't looking at the problem the way I think it should.

</li>
</ul>
<p>Every now and then a spammer finds a hole in this defense, but after a couple days PF has things sorted out again.&nbsp; That's how things should work.</p>
<hr />
<p><em>For the record, I'm currently using POPFile version 0.20.1, which uses the BerkeleyDB for storage.&nbsp;&nbsp;The developers moved to a SQL engine in March with version 0.21.0 (currently 0.21.1), but didn't convince me a change was necessary; I'm unlikely to change until there's a major upgrade.&nbsp;&nbsp;&nbsp;Version 0.20 is slower than version 0.19 was, but not in ways which bother me.&nbsp; Your mileage may vary, of course.</em></p>
<hr />
<p>Thus my current report.&nbsp; I remain very satisfied with the tool.</p>

]]></content:encoded>
			<wfw:commentRss>http://dabblersjournal.com/2004/07/03/living-with-popfile/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>POPFile on PowerBook</title>
		<link>http://dabblersjournal.com/2004/03/26/popfile-on-powerbook/</link>
		<comments>http://dabblersjournal.com/2004/03/26/popfile-on-powerbook/#comments</comments>
		<pubDate>Fri, 26 Mar 2004 16:21:01 +0000</pubDate>
		<dc:creator>dabbler</dc:creator>
				<category><![CDATA[Dabbler]]></category>
		<category><![CDATA[POPfile]]></category>
		<category><![CDATA[popfile mail classifier sorter spam_stopper]]></category>

		<guid isPermaLink="false">http://dabblersjournal.com/2004/03/26/popfile-on-powerbook/</guid>
		<description><![CDATA[<p>I resisted installing <a href="http://popfile.sourceforge.net/">POPfile</a> for several weeks--partly because I wanted to be more familiar with the Mac environment before installing something so far out of the ordinary, and partly because I wanted to give mail.app's junk filter a test.&#160; By January's end, it was pretty clear that the Junk Mail filter doesn't work as well as I'd like, and I missed POPfile's more general mail sorting capabilities.&#160; As I've <a href="http://dabblersjournal.com/2003/11/16/popfile-sorting-the-mail/">mentioned before</a>, I sort incoming mail into a couple dozen categories.&#160; Teaching POPfile to recognize those categories lets me get by with two dozen rules, rather than a couple hundred.&#160; Much better.</p>]]></description>
			<content:encoded><![CDATA[<p>You'll <a href="http://dabblersjournal.com/2003/12/21/a-dabblers-powerbook-finding-my-way/">perhaps recall</a> that when I moved my e-mail to the PowerBook, I provisionally moved it into Apple's mail.app.&nbsp; That provisional decision has become permanent; for my purposes Mac Mail (with POPfile) is a fine application.</p>

<p>I resisted installing <a href="http://popfile.sourceforge.net/">POPfile</a> for several weeks--partly because I wanted to be more familiar with the Mac environment before installing something so far out of the ordinary, and partly because I wanted to give mail.app's junk filter a test.&nbsp; By January's end, it was pretty clear that the Junk Mail filter doesn't work as well as I'd like, and I missed POPfile's more general mail sorting capabilities.&nbsp; As I've <a href="http://dabblersjournal.com/2003/11/16/popfile-sorting-the-mail/">mentioned before</a>, I sort incoming mail into a couple dozen categories.&nbsp; Teaching POPfile to recognize those categories lets me get by with two dozen rules, rather than a couple hundred.&nbsp; Much better.</p>

<p>So I installed POPfile on the Mac on January 31.&nbsp; While the process was harder than I'd have liked (I'd <a href="http://www.artz-net.de/popfile/">walk you through it</a>, but John Graham-Cumming is aware of the problem and plans to simplify the Mac install), in the end I had a working installation.&nbsp; Three thousand messages later, I've got POPfile trained again and continue to be delighted with the system.</p>

<p>I'll again use the format I was using for <a href="http://dabblersjournal.com/2003/11/16/popfile-sorting-the-mail/">last</a> <a href=http://dabblersjournal.com/2003/11/21/popfile-revisited-another-thousand-messages-received-a-new-version-installed/">fall's</a> reports....</p>
<hr />
<p>The test span ended March 25, 2004, at 3,028 messages.</p>
<ul>
<li>211 (7.0%) were sent to the wrong bucket.
<ul>
<li>(Therefore) 93.0 % were sent to the <em>right</em> bucket.

</li>
<li><em>Keep in mind that this is a new install, and the first several hundred messages are sacrifices to training....</em>
</li>
</ul>
</li>
<li>1,241 (41.0%) were <strong>spam</strong>.&nbsp; (Basically no change from November.)
<ul>
<li>I've dropped the <strong>virus</strong> &amp; <strong>bounced</strong> categories; they're now counted as spam....

</li>
</ul>
</li>
<li>110 messages were <strong>auction</strong>-related; 9 of these were false negatives and 9 were false positives.&nbsp; That's about like before; this will improve with training.
</li>
<li>The <strong>Vendor</strong> (90 messages/15 false +/21 false -) and <strong>Mailing List</strong> (251/45 f+/19 f-) categories, both of which are catch-alls, need serious training.&nbsp; This reflects my earlier experience.&nbsp; The problem in both cases is that "well-designed" spam looks superficially like these categories.

</li>
<li>There's a rather odd behavior which wasn't a problem in the previous installation:&nbsp; I use <a href="http://www.changedetection.com/">Change Detection</a> to track a number of web pages which really ought to have RSS feeds.&nbsp; For some reason, POPfile's having difficulty telling notifications about <strong>Blogs</strong> (4/20 f+/2 f-) from notifications about <strong>General Baseball</strong> (77/22 f+/29 f-)--which suggests it's more aware of the similarities (which are numerous) than the differences (which I consider really blatant).&nbsp; The really odd thing is that I've got several other <em>Change Detection</em> categories, which it's handling well.&nbsp; We'll have to see how this plays out over the next few weeks, when the baseball sites get really active.

<ul>
<li>(This could, of course, turn out to be operator error.&nbsp; But I think I'm smarter than that.)
</li>
</ul>
</li>
</ul>
<p>Thus my early report.&nbsp; Things are about where they were at 3,000 messages last time Iinstalled PF, so I'm satisfied.&nbsp; I'll keep you informed.</p>
]]></content:encoded>
			<wfw:commentRss>http://dabblersjournal.com/2004/03/26/popfile-on-powerbook/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Dabbler&#8217;s Powerbook: finding my way</title>
		<link>http://dabblersjournal.com/2003/12/21/a-dabblers-powerbook-finding-my-way/</link>
		<comments>http://dabblersjournal.com/2003/12/21/a-dabblers-powerbook-finding-my-way/#comments</comments>
		<pubDate>Sun, 21 Dec 2003 16:40:21 +0000</pubDate>
		<dc:creator>dabbler</dc:creator>
				<category><![CDATA[Computing]]></category>
		<category><![CDATA[Macintosh]]></category>
		<category><![CDATA[POPfile]]></category>
		<category><![CDATA[apple]]></category>
		<category><![CDATA[email]]></category>
		<category><![CDATA[mail]]></category>
		<category><![CDATA[os x]]></category>
		<category><![CDATA[powerbook]]></category>
		<category><![CDATA[transition]]></category>

		<guid isPermaLink="false">http://dabblersjournal.com/2003/12/21/a-dabblers-powerbook-finding-my-way/</guid>
		<description><![CDATA[<p>When I wasn't shopping this weekend, I was trying to move past the "what a neat toy" phase with my new laptop.  OS X is enough like XP to be familiar, and enough different to be both annoying and fascinating.  That's been covered elsewhere; I'll likely leave it alone....</p>]]></description>
			<content:encoded><![CDATA[<p>When I wasn't shopping this weekend, I was trying to move past the "what a neat toy" phase with my new laptop.  OS X is enough like XP to be familiar, and enough different to be both annoying and fascinating.  <em>That's been covered elsewhere; I'll likely leave it alone....</em></p>

<p>This is a powerful machine.  For much of yesterday afternoon and evening I was:</p>

<ul>
	<li>making massive file transfers across my WiFi network.</li>
	<li>copying and playing Christmas CDs, and</li>
	<li>checking for useful advice, using Safari.</li>
</ul>

<p>My desktop system, which isn't a slouch, would have been (actually, <em>was</em>) stressed with all that activity.  The Mac just puttered along.  Pretty impressive.</p>

<hr />

<p>Today's efforts were largely devoted to moving my e-mail focus from <strong><a href="http://eudora.com/email/features/ss.html">Eudora</a></strong> on the PC to whatever I could get working on the laptop.  That turned out to be Apple's <strong><a href="http://www.apple.com/macosx/features/mail/">Mail</a></strong>, though I'm not committed to that decision.  The transition was not a pretty effort; while it seems like this ought to be easy, it turns out to be exasperating.  You can't just tell the new program to import the old program's files, so I tried a number of variations on "copy the files to the laptop and see if the client can import that."  Nothing worked well.  <a href="http://eudora.com/techsupport/kb/1644hq.html">Eudora's instructions</a> cover the basics but the results were pretty flakey.  I'm also not convinced that Eudora's Mac interface meets my needs, though I'll likely give it another chance.</p>

<p>I <strong>can</strong> wholeheartedly endorse a format conversion tool called <a href="http://www.weirdkid.com/products/emailchemy/">Emailchemy</a>, by the way.  A <em>very fine</em> piece of shareware.</p>

<hr />

<p>Sometime soon I'll need to solve <a href="http://www.artz-net.de/popfile/">PopFile on OS X</a>.  <em>That</em> promises to be <em><strong>interesting.</strong></em></p>]]></content:encoded>
			<wfw:commentRss>http://dabblersjournal.com/2003/12/21/a-dabblers-powerbook-finding-my-way/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PopFile Revisited: another thousand messages received; a new version installed</title>
		<link>http://dabblersjournal.com/2003/11/21/popfile-revisited-another-thousand-messages-received-a-new-version-installed/</link>
		<comments>http://dabblersjournal.com/2003/11/21/popfile-revisited-another-thousand-messages-received-a-new-version-installed/#comments</comments>
		<pubDate>Fri, 21 Nov 2003 22:18:02 +0000</pubDate>
		<dc:creator>dabbler</dc:creator>
				<category><![CDATA[Computing]]></category>
		<category><![CDATA[POPfile]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[email]]></category>
		<category><![CDATA[evaluation]]></category>
		<category><![CDATA[report]]></category>
		<category><![CDATA[review]]></category>
		<category><![CDATA[sorter]]></category>
		<category><![CDATA[spam filter]]></category>

		<guid isPermaLink="false">http://dabblersjournal.com/2003/11/21/popfile-revisited-another-thousand-messages-received-a-new-version-installed/</guid>
		<description><![CDATA[<p>All in all, that's a rather impressive performance. The increasing spam count is also rather impressive; after all, I added this tool to my kit because the junk seemed to be getting out of hand.</p>
]]></description>
			<content:encoded><![CDATA[<p>Today we reached another thousand. After printing and resetting the report, I loaded POPfile's new version. I'll certainly keep you informed....</p>

<p>Continuing in the same format I used in my <a href="http://dabblersjournal.com/articles/2003/nov/popfile.html">earlier note</a> about PF:</p>

<h3>Fourth Thousand</h3>

<p>This test span ended November 21 at 1,000 messages.</p>
<ul>
	<li>26 (2.6%) were sent to the wrong bucket.
<ul>
	<li><em>97.4% were sent to the <strong>right</strong> bucket....</em></li>
</ul>
</li>
	<li>415 (41.5%) were spam. <em>Again: <strong>Wow!</strong></em></li>
	<li>4 (0.4%) were probably virus-laden.</li>
	<li>4 (0.4%) were bounced email.</li>
	<li>Auction seems to be fully solved; 39 messages, with one false positive and one false negative.</li>
	<li>The Vendor category may have finally improved: 14 messages; only three errors.</li>
	<li><strong>Lists</strong> looks better: Ten false positives and no false negatives associated with 51 messages.</li>
	<li>A new category, created (with my new e-mail address) to service this weblog, had 8 errors--to go with seven messages. New categories are always problems....</li>
</ul>

<p>All in all, that's a rather impressive performance. The increasing spam count is also rather impressive; after all, I added this tool to my kit because the junk seemed to be getting out of hand.</p>

<hr />

<p>The new POPfile version has a thoroughly-revamped back end, and some modifications to the code in the engine. We'll see how it goes.</p>

<hr />

<p>Jon Udell's also talking about <a href="http://weblog.infoworld.com/udell/2003/11/20.html#a851">using Bayesian categorizers</a>, at both a higher level of abstraction and greater detail. <em>Worth a look</em>.</p>]]></content:encoded>
			<wfw:commentRss>http://dabblersjournal.com/2003/11/21/popfile-revisited-another-thousand-messages-received-a-new-version-installed/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>POPfile: sorting the mail</title>
		<link>http://dabblersjournal.com/2003/11/16/popfile-sorting-the-mail/</link>
		<comments>http://dabblersjournal.com/2003/11/16/popfile-sorting-the-mail/#comments</comments>
		<pubDate>Sun, 16 Nov 2003 14:16:22 +0000</pubDate>
		<dc:creator>dabbler</dc:creator>
				<category><![CDATA[Computing]]></category>
		<category><![CDATA[POPfile]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[email]]></category>
		<category><![CDATA[filter]]></category>
		<category><![CDATA[spam]]></category>

		<guid isPermaLink="false">http://dabblersjournal.com/2003/11/16/popfile-sorting-the-mail/</guid>
		<description><![CDATA[<p>I receive between 50 and 100 e-mails each day, and read about 60% of those (the unread ones are either duplicates or spam). I used to read about 85% of my mail; the change in percentage is largely because of the increasing spam load. (Eudora has a reporting function; these numbers have some relation to reality.) Perhaps 65% of the real mail has baseball content of some sort or other; the rest is on a wide range of topics.</p>]]></description>
			<content:encoded><![CDATA[		<p>When Sobig's author unleashed his spam (and bounced email) plague on us last August it became clear I needed to automate my mail sorting process; I was spending far too many hours writing rules.&nbsp; After checking out the sites for a couple filtering products I'd heard of, I decided to see if <a href="http://POPfile.sourceforge.net/">POPfile</a> met my needs.&nbsp; I loaded it on my machine, spend a couple hours making setup decisions, and did the necessary configuration of both POPfile and Eudora.</p>

<p>An essential fact:&nbsp; While POPfile usually functions as a spam filter, its design supports sophisticated sorting of email into a large number of categories.&nbsp; I'm using it as a mail sorter; the spam filter is important, but the software's smart about all of my mail, and in a real sense the spam folder's just another target for the sorter.</p>

<h3>Basic Information</h3>

<p>I receive between 50 and 100 e-mails each day, and read about 60% of those (the unread ones are either duplicates or spam). I used to read about 85% of my mail; the change in percentage is largely because of the increasing spam load. (Eudora has a reporting function; these numbers have some relation to reality.) Perhaps 65% of the real mail has baseball content of some sort or other; the rest is on a wide range of topics.</p>
<p>These get sorted into a couple dozen categories; I tinker with these a bit, but they are essentially the same categories I used for sorting e-mail in 1995.&nbsp; A large percentage of my mail originates from the <a href="http://www.sabr.org/">Society for American Baseball Research</a> list called SABR-L, which has its own folder; the remaining folders group mail in ways which largely reflect my mental prioritizations.&nbsp; One folder, called "Lists," is the target for mailing lists on miscellaneous topics.&nbsp; <em>I sometimes ignore SABR-L for months; I check my eBay mail daily.</em></p>

<p>After reading the POPfile documentation, I decided to see how well it sorted the total daily package.&nbsp; I set up "buckets" to match the folders, replaced several hundred Eudora rules with twenty-five, and set about teaching POPfile how to sort things. This story begins on August 18.</p>

<p>Here's my report....</p>

<h3>First Thousand</h3>

<p>Since you train POPfile by correcting its errors, the first few dozen messages are basically all errors and the first few hundred are unreliable.&nbsp; I took an accounting after message 1,049, which arrived on September 30.</p>

<ul>
<li>104 (10.0%) were sent to the wrong bucket.
<ul>
<li><em>90.0% were sent to the <strong>right</strong> bucket....</em>

</li>
</ul>
</li>
<li>207 (19.7%) were spam.
</li>
<li>25 (2.4%) were probably virus-laden.
</li>
<li>114 (10.9%) were bounced email.
</li>
<li>PF had particular problems with the Auction bucket; it made 15 wrong guesses (11 false positives &amp; 4 false negatives) in a category with only 11 total messages.
</li>
<li>PF also had significant problems with the Vendor bucket, with eight sorting errors among only nine total messages.
</li>
<li>The List category, which seems to me the most difficult to train, received 40 messages; PF generated 12 false positives and 4 false negatives.
</li>

</ul>

<h3>Second Thousand</h3>

<p>POPfile weathered its adolescence in the first half of October, and reached message 999 on October 18.</p>

<ul>
<li>41 (4.1%) were sent to the wrong bucket.
<ul>
<li><em>95.9% were sent to the <strong>right</strong> bucket....</em>
</li>
</ul>
</li>
<li>249 (25.7%) were spam.

</li>
<li>5 (0.6%) were probably virus-laden.
</li>
<li>0 (0.0%) were bounced email.
</li>
<li>PF stopped having problems with Auction; 30 messages, with no false positives and three false negatives.
</li>
<li>PF's Vendor bucket issues seemed to abate, with only five sorting errors among twenty-one total messages. <em>Better, but still unacceptable.</em>
</li>
<li>The List category continued about as before: 55 messages, with 12 false positives and 2 false negatives.
</li>
</ul>

<h3>Third Thousand</h3>

<p>This test span ended November 4 at 1,008 messages.</p>

<ul>
<li>41 (4.1%) were sent to the wrong bucket.
<ul>
<li><em>95.9% were sent to the <strong>right</strong> bucket....</em>
</li>
</ul>
</li>
<li>408 (40.7%) were spam. <em><strong>Wow!</strong></em>
</li>
<li>2 (0.2%) were probably virus-laden.
</li>

<li>2 (0.2%) were bounced email.
</li>
<li>Auction was basically clean; 20 messages, with one false positive and one false negative.
</li>
<li>PF's Vendor bucket sort deteriorated, with thirteen sorting errors among twenty-nine total messages. <em>Yucky.</em>
</li>
<li>The List category remains problematic: 51 messages, with 15 false positives and 3 false negatives. <em>I suspect this will only improve if I split the category into logical sub-groups.</em>
</li>
</ul>

<h3>Since November 4</h3>

<p>I've received 712 messages; 97.6% are being sorted correctly. Not bad, if you ask me. I'll not give you a further breakdown 'til I reach 1,000.</p>

<p>POPfile's principal author, John Graham-Cumming, announced a new version a couple weeks ago, which I've not yet installed. I'll do that in a day or two.</p>
]]></content:encoded>
			<wfw:commentRss>http://dabblersjournal.com/2003/11/16/popfile-sorting-the-mail/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
