Gmail Search; Readers’ Respond and a Proposal

Well. That was an interesting response.

Links: Gmail; Why Can’t Gmail Search?; Hacker News Comments; Stemming; Google Gears

My post on the failings of Gmail search proved quite popular, and garnered a number of interesting responses. Aside from the various name-calling (first time I was ever called an asshat, so circle the day on my calendar!), and the one guy who accused me of being a shill for Yahoo, most of responses here at my blog and on the posting at Hacker News fell into three main areas.

“You are stupid for wanting search to work like that!”

Fair enough, if that’s what you think. But your saying so doesn’t change the fact that Gmail’s search is failing to help me manage my email. And I can see clearly that as the years of email pile up it will only get more important to be able to search very specifically.

Some of these responses were along the line of “Use Search Autocomplete from labs for searching for people”. OK, but that violates the experience Google has habituated me to expect: One single box, type some text in, it’ll do a good job finding it. I don’t want to know there is a special auto complete. There’s a search box at the top; I want to use it.

“All email search sucks, but at least Gmail is fast!”

This is both funny and true. One of the real virtues of Gmail’s existing search is that it is really damn fast. The Yahoo Mail search I referenced in the original post was *much* slower than Gmail’s search. But one poster referenced Gene Weinberg’s story from Code Complete regarding speed; another poster had the same reaction I did:

It's not "completely broken," but no hits for a query of "zag" in an email that contains "zagg" comes uncomfortably close to "doesn't work."

So for me, Gmail’s speed is nice, but it is useless speed if I can’t find what I need. A number of people agreed with me.

This comment made me laugh though:

I love Gmail search because I've seen the difference - have you ever tried searching email in Mobile Me? It's like they ignore your query and search a random string.

Snort.

Finally, the guy who commented (paraphrasing) ” … I couldn’t find a specific email so I assumed I deleted it; now I’m not so sure” describes a very real problem. If you can’t find it by search and you know search is deficient, you end up wondering if you deleted it. If you never find it, the effect is the same as if Google was randomly deleting emails.

Google can’t afford to index all that email! It’s hard.

Well, yes. It is hard, and having implemented various substring indexing systems in my professional career I know all too well how expensive in both time and space it can be. I grant the point. But if we don’t think about the hard things, nothing interesting ever gets done.

Plus, searching through ten years of email without substring search seems like it will become impossible.

Gmail encourages you to not delete; archive is the mantra. But ten years from now I’ll have a ton of email. I will likely need amazing search capabilities to make use of it, otherwise you might as well just delete your old email and be done with it. Maybe you shouldn’t keep email, but the prospect of having all those old emails interests me.

So what could Google do?

A really intriguing suggestion was for Google to use Google Gears to substring index your mail locally on your computer. This has a certain appeal for sure, but is not really a general solution when you want to be able to use public access computers at the library to check your email, or use your iPhone, etc. Still, a really clever idea.

Substring indexing itself was generally regarded as too hard for this reason: it would take a ton of effort and storage to fully index all the text in all the emails across all the Gmail accounts. This argument seems persuasive, until you look at it too hard:

The number of words/tokens to substring index is not really proportional to the number of bytes in an email X number of emails per account X number of accounts. It is much smaller than that, for several reasons:

  • Most of the space emails take up is not really text; it is attachments. Pictures, etc. Those bytes don’t matter for this sort of indexing.
  • The overlap in tokens (words) across all the emails Gmail holds must be huge; I bet the total amount of unique tokens is vastly smaller. Manageable, even. So you could have each word substring indexed (tries, trigrams, take your pick of a hundred ways to do this).
  • When a new email arrives, indexing it would be a matter of recording which tokens it contains (bloom filters?).
  • Some of the tokens would have to be substring indexed on the fly; over time (pretty quickly I expect), this would be infrequent

Then, to perform a substring search, you identify the tokens the substring matches (from the total corpus of known tokens), then filter for the emails containing those tokens which belong to the user. From there you have ground the problem down pretty far.

Yes, I simplified to beat the band. Yes, scale is a real issue. Sure, you could make assumptions that a user will stick to a single language. Whatever. But it is one solution that might work.

Another thought is to do aggressive tag generation; so that the token “Zagg” automatically indexed to “headphone”, “earbud”, “headset”, etcetera; so a search for “headset” might be able to suggest Zagg. This is kinda like what Google’s web search does, if it was integrated into email search.

Why can’t Google come up with a better idea? Um, they do index the web, for Pete’s sake!

Can you come up with another approach? An even better one? Cool! Let me know.

Heck, let Google know. Because at the moment it is still Search Fail for Gmail.

Explore posts in the same categories: Tech

Tags: , , ,

You can comment below, or link to this permanent URL from your own site.

7 Comments on “Gmail Search; Readers’ Respond and a Proposal”

  1. Ben Nicolas Says:

    Any sort of sub-string solution would be great.

    I’m would be happy if they could just solve my basic problem of whole word searching being accurate. I’m perplexed as to how I can see the same word in 2 e-mails that actually have the same label, search both within that label and through ‘All Mail’ and only find 1 of the 2 e-mails…

  2. sedwards Says:

    When I first read this, I couldn’t believe it. But I was able to verify myself in about two seconds. Holy crap.

    But then I thought about it for a little bit, and it made sense. Substring searches in GMail don’t work because substring searches in Google don’t work, period. Google search is primarily semantic. It has to be, or else the results wouldn’t make any sense. The best it can do is suggest alternative spellings (“Did you mean X?”) for a different search.

    Obviously when it came time to add a search feature to GMail, they just plugged in the existing web search engine. Done. Way to leverage existing code, guys!

    Unfortunately, this issue points out why the approach is insufficient for the average mail user.

    Suggestion: Might the Google API support a browser plug-in to do this?

    Personally, I don’t yet trust The Cloud for stuff I want to save. Only things with a short shelf life get routed to GMail, so I generally don’t search there. My email searching gets done by OS X Spotlight. Not very portable, but totally sufficient for me.

    By the way, somebody gave me a Best Buy gift card, so I’m going to try a pair of these:

    http://www.airdrives.com/

    • designbygravity Says:

      Hah! I am pretty sure you are exactly right about *why* it doesn’t work; emails lack the interconnectiveness (sp?) of web pages so the algorithms used for web indexing don’t really apply.

      The plugin option is interesting, as is the aformentioned local Gears option. Still, compared to other things Google does, it just doesn’t seem *that* hard.

      • sedwards Says:

        Well, as we both know, the underlying architecture could make the addition of an outwardly simple requirement an intractable mess. Or it could just be boring enough that nobody wants to spend their precious 20% on it.

  3. miguel Says:

    yes gmail search stinks.
    dont know how they store the data, but why do you need to index it? why not, if it was all readable text, just GREP the damn text???

  4. Stan Says:

    The same search has infected the Android platform. You can’t search for a substring! Sometimes I need to find a phone number by a few digits that I remember and it just doesn’t work. This worked nicely on Palm not to mention a bunch of much older non-smart phones.

  5. KJ Says:

    I would be happy if it would just find EXACT matches when I search in gmail. Ex: I know I sent an email to my bosses, I know my name “KJ” was part of the subject line, and I know I used the word “manuals” in the body of the email. I used that exact criteria and Gmail said there were no matches.

    When I finally did find the email through a tedious manual search, I entered the EXACT subject line as well as my address as sender and my bosses as recipients. It still couldn’t find the email on its own. I run into this all the time. It is really useless. But we have a domain in Google and this is a paid account. Google is my default search engine and home page, we have tons of Google Sites, we use Google Docs, plus I have a personal Gmail account. I have an android phone, yada yada. I love Google, just not the search in Gmail.


Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.