Search, Don’t Sort?

Searching is, I am often told, the new sorting – witness GMail, which doesn’t really have the idea of folders for mail at all, and the popularity of Desktop Search. (My experience with Beagle was not really a pleasant one, but I’ll save that for another post.)

I certainly find myself spending mental energy on “in which folder do I file this message?” far too often, and the only thing folders seem to do for me is cause me to look in the wrong one when I want something. Therefore, I am considering taking all my archived mail and dumping it into a single folder per account – sent mail and all.

As this is, after all, a hard-to-reverse process, I need to consider whether my IMAP server and Thunderbird will both be able to cope. Does anyone have any experience of using this approach on a large message base?

For sizing purposes, I appear to have 3595 non-spam messages in total in my personal account, and 8648 in my Mozilla account. The IMAP server concerned is tuschin.blackcatnetworks.co.uk, which runs Courier IMAP.

22 thoughts on “Search, Don’t Sort?

  1. I have about 5000 mails/year and I do exactly what you said: putting them all into one folder. Works fine as long as I use pop3 rather than imap.

  2. I have experience with this, and I’ve found that it depends on the speed of your IMAP server and whether your IMAP server will search perform any search you submit before it times out.

    The IMAP server I’m stuck with is slow and configured to not let searches run to completion (probably from being overburdened with lots of other requests from other users). It often times out before it returns search hits to my client. As a result, I can’t use saved searches.

    I have to use real folders and filters to sort my mail into manageable groups. This means that the more natural thing (seeing sent and received email in one thread) doesn’t work. Some of this might be a TB limitation (filters on incoming mail only, no way to consider threading across folders, both as far as I know).

  3. I go a bit overboard with organising everything into folders, but that’s because I find searching too slow. I do the same for most of my actual filesystem too.

  4. In my experience supporting Mozilla products using POP, from old Mozilla 1.3 to Thunderbird 1.5, when the Inbox approaches 2GB the indexing starts breaking down: Typical symptoms are that the preview pane stops corresponding to the message selected in the thread pane, and the Inbox starts re-indexing too frequently (and re-indexing takes awhile for 2GB of data). I’ve seen this problem on many different computers.

    I wish I could dump everything into the Inbox, tag it (in TB 2), and use search folders, but the indexing problem prevents me.

    I’ve never actually researched the problem much; I guess I always assumed it was a limitation of mbox. Maybe there is already something in Bugzilla.

  5. Too many mails can get problematic with IMAP. With POP I have the limit for me seems to be somewhere around 100’000 mails per folder. With more things get sluggish, especially searching. This is both in TB and Seamonkey.

  6. As long as you dump your spam into another folder, you should be ok with that volume. Still, as people have said it depends a lot on how fast the server is. Unfortunately, IMAP servers rarely keep any kind of index of the email, so every search has to actually read each message to see if it matches. Combined with the fact that the server has to report all of the matches in one reply, and the result is that after you accumulate a certain number of messages in one folder searches will start to time out.

    JB: I saw on the Rumbling Edge that the trunk Thunderbird now has an option to file replies in the same folder as the message you replied to, so the next version of Thunderbird won’t have that problem.

  7. I use IMAP with my university account, have several thousand emails in my inbox, and only put list serve stuff in separate folders. I’ve never experienced any problem with searching, and I’ve definitely had about 10k messages before (I never really like to delete anything until absolutely necessary.)

  8. I’ve been keeping most of my mail in my inbox for a few years now.

    I procmail out lists into their own folders to avoid swamping the inbox, but thunderbird and courier imap seem to cope with what is now 19,456 messages. However, thunderbird seems to insist on asking the IMAP server to do searches — even searches it could answer out of it’s own copy of cached headers. And they always time out on a folder that size, alas.

  9. Thanks for all the comments.

    Donald – yes, it does seem odd that Thunderbird asks the server to do searches rather than doing them locally. I’ve long failed to understand why Thunderbird doesn’t have a mode where it automatically downloads every message it touches into the offline store without a need for syncing, and then does searching etc. from that.

    As a GTDer, I wouldn’t keep all my mail in my Inbox, but I was envisaging an “archive” folder, which would also be my default folder for sent mail.

  10. Note that bug 347665 was yesterday checked in on the branch. It will speed up opening large IMAP mailboxes.

  11. It occurred to me that you could test drive using just one archive folder if you’d tag (in TB 2.0) all your present mail according to their folders. I.e. a message in the hypothetical folder “advocacy” would get tagged advocacy.

    If the scheme wouldn’t work for you, you could revert back to using folders with little more work than what it took to get to the one folder state. There’s issues with this idea of course… not least of which is TB 2.0’s tags being just local (afaik?) and the pile of not-sorted mail during the “testing phase”.

  12. It occurred to me that you could test drive using just one archive folder if you’d tag (in TB 2.0) all your present mail according to their folders. I.e. a message in the hypothetical folder “advocacy” would get tagged advocacy.

    If the scheme wouldn’t work for you, you could revert back to using folders with little more work than what it took to get to the one folder state. There’s issues with this idea of course… not least of which is TB 2.0’s tags being just local (afaik?) and the pile of not-sorted mail during the “testing phase”.

  13. It depends on how fast searches on body text are in your IMAP server.

    I recommend just using GMail :-).

  14. I’m having a lot of trouble with Thunderbird’s IMAP connection timing out, and then it deciding to dump its *entire* cache (offline copies, message headers, folder list, and all) and start redownloading my email (which takes awhile). I have several thousand messages on this account — on the same order as your mailboxes — with most spit between two folders (1100 in the inbox, 4000 in a subfolder).

    What I suggest, if you want to regroup your email, is to keep separate folders for each mailing list (no mental effort — filters sort it for you), and don’t use your inbox as the archive folder.

  15. fantasai: I’ve had this problem with a few folders recently, although not my entire messagebase. I wonder if it’s a regression in 2.0 beta 1. What version are you using?

  16. > My experience with Beagle was not really a pleasant one, but I’ll save that for another post.

    I’m the maintainer of Beagle, and I’m interested in hearing about any problems you had with it. Feel free to email me about it, and we can discuss.

    Thanks,
    Joe

  17. This works fine for me with UW IMAP, as long as I create a separate archive folder every 20,000 or so messages (e.g., Archive-2004, Archive-2005, and so on in my case). At 24,000 or so messages, UW IMAP starts timing out on searches, and at 27,000 or so messages, UW IMAP stops letting me access the folder at all. Under 20,000 messages everything works great and searches are reasonably fast.

  18. I use most of the system. I don’t have things which don’t fit my lifestyle – e.g. the 43 folder “tickler file”, because I very rarely have paper reminders of events to come.

    The big wins are thinking about things in terms of do/delegate/defer, having an inbox that’s regularly empty, having a filing system… there are lots.

  19. Just saw this as I’ve been away …

    I’ve got a large volume of mail – more than 200000 messages, totalling about 5GB. I’ve set up Zimbra as my IMAP server. This lets me have:
    – a tree of folders that I can use over IMAP
    – tagging that works either through the web interface or through Thunderbird
    – fast text indexing that I can access through the web interface

    It would be nicer if Thunderbird would allow me to access the text indexing remotely, but this is still a fantastic combination for managing my mail…