A General Solution To Site Spoofing

[For those who may not know: site spoofing is when someone sets up a site and tries to get you to visit it, thinking it’s another site. So they might set up http://www.paypai.com (note lowercase I) and send you an email inviting you to visit “http://www.paypal.com” and give your login details.]

Site spoofing is a significant issue. It enhances phishing attempts, which is apparently already a hundred-million dollar industry, even according to pessimistic estimates. It’s our users who are being defrauded. And, as spam has shown, where there’s money to be made, the problem doesn’t just go away. It’s mozilla.org’s responsibility to do its best to deserve our reputation for security innovation by helping people to not be taken in. Domain name registrars who process thousands of applications per day at low margins can’t be expected to hold the fort.

However, finding a solution is hard because it involves communicating clearly and unambiguously to a novice user the tiny difference between a lowercase l and a lowercase i, and every other possible pair of confusable letters, without overwhelming them with so much confirmation and checking information that they just ignore all of it. I’m not sure that there’s a textual solution, due to the almost infinite variety of domain names and fonts, and the human tendency to assume something almost the same is in fact the same.

So here’s my idea. You hash the domain name, convert the hash into an RGB colour value, and colour some UI element that colour when you are on the site. This doesn’t provide any first-visit protection, but it does provide a one-glance check as to whether you are on the same www.paypal.com that you were last week. And sites where you have high-value logins tend to be ones you’ve visited before.

I continue to maintain that anti-phishing and anti-spoofing efforts are pointless if the true site doesn’t at least have SSL. So I suggest that, in Firefox, the background colour of the status bar field which shows the domain name would be the correct thing to change:

  • It’s semantically associated – the domain name is what’s being hashed
  • It’s part of a piece of UI (the status bar) which is always present
  • People look there on secure sites to see the name and the lock anyway.

So if users know from repeat visits that www.paypal.com is legitimate, then www.paypai.com is easily seen as bogus.

The human eye is very sensitive to colour differences; this should mitigate the fairly unlikely event of a spoof having a similar colour as the original. This scheme also has the significant advantage that it requires no user configuration whatsoever, and only minimal instructions (“Colour not the same? Be suspicious!”).

There are extra foibles like trying to vary intensity so the colourblind don’t get too left out, and limiting the colour range to make sure the text is always readable, but that’s the basic idea. For bonus points, we should use a hash function which tries to ensure that the closer two inputs are, the more different the hashes, and for extra bonus points, all browser vendors should agree on the same hash function and colour range, so www.paypal.com is light green in all browsers.

23 thoughts on “A General Solution To Site Spoofing

  1. the thing is that the relationship between the hash and the rgb value may lead to a vulnerability. If you get a hash value that is “close enough” to the range of hashes used to determine the limited number of colors (due to readability issues), you may get lucky and manage to get the same color as the legit site. if people become reliant on the color instead of checking the url closely then you may be doing more harm than good.

  2. If you get a hash value that is “close enough” to the range of hashes used to determine the limited number of colors (due to readability issues), you may get lucky and manage to get the same color as the legit site.

    I believe the human eye can distinguish between about 16 million colours (that’s why displays never went above 24-bit colour). Even if we eliminate the dark half because they obscure the text, and eliminate immediately adjacent ones in all directions, that’s still about 1 million distinct colours.

    Site spoofers can’t just choose any domain name – it would have to have a similar name and a similar hash. Perhaps, for a given site (paypal.com), there might be one domain name which is close enough both in name and colour. But when PayPal shut that person down, that would be it. At the moment, these things are springing up everywhere. It certainly massively reduces the scope of the problem.

  3. The idea is interesting, and perhaps with some refinement could lead to a solution.
    Maq’s comment about colours closely resembling the valid domain is a significant problem however.

    Also of note, this proposal wouldn’t prevent attacks on the hosts file – phishers are creating static entries for youronlinebank.com to their own sites.

  4. If phishers can alter your hosts file couldn’t they also install keyloggers, IE BHO’s, Firefox extensions, etc? The game’s already over at that point. (I expect anti-spyware software will soon start reporting every entry in your host file, just as they insist that my IE homepage of about:blank must be a hijack attempt.)

  5. GREAT idea, Gerv! I like this a lot.

    To stop people form hacking the HOSTS file, why not, instead of just hashing the domain name, also factor in the IP? That way if the paypal.com/ (the real paypal) is green, if they change paypal.com in the hosts file to point, paypal.com/ would lead to a different color on the hash table.


  6. gerv:

    If you place #FF0000 and #F00000 side by side, then yeah, the human eye can distinguish them. But when they’re not side by side, all I would remember is “bright green”. I would expect myself to distinguish something between maybe 16 to 256 colors (i.e., 4 ~ 8 bit)

    (I do think a Firefox extension was recently made to to replace the tab icons with a colored block of some color derived from the domain when none is given… Can’t seem to find that ATM though)

  7. I agree with Mook, there aren’t that many colours that you’ll distinguish from memory.

    1) How about if the text doesn’t match the domain. Eg, linking text of “paypal.com” without linking to “paypal.com”. This includes ip addresses. Bring up the warning then.

    2) Phishing is only a problem if you’ve already got an account, so you’ve probably been there. How about if any site you’ve logged into (via https) is remembered and similar spellings but with 1 instead of I bring up a warning. It could be further categorised by entering the same username and password as the other site (verified by a keeping a hash of these details). So that it’s not a risk the domain could be a soundex (or something) and the username/password as a hash. Unfortunately you couldn’t just limit this to https, so it’d be the domain of every site you visited.

  8. Factoring in the IP makes the whole process worse than useless, because it will cause suspicion in people whenever the IP is changed. For this to be truly effective, paypal.com must always be the same colour.

  9. Been thinking more about this and doubt that colour will be the best way. If we believe that Firefox users are at high risk (which I’m not sure of), why not have a general alerting mechanism like:

    * I visit https://paypal.com, and click some button entitled “Look4Spoof”, which adds this domain name to a list.
    * Whenever I visit a https site, a pattern match algorithm is run against sites in my “Look4Spoof” list.
    * If the URL I’m at is similar to one in my list, but doesn’t match, alert user to the discrepancy.

    It takes the load off the user to notice possibly subtle colour differences, and averts prevents horrible colour schemes :)

  10. Matthew: your 1) can be avoided simply by not using the domain as the linking text; many phishing emails don’t. It’s also a very hard algorithm to write.

    Your 2) requires keeping some sort of database of possible letter substitutions. I’m really not sure that would scale very well.

    trooper: there are two problems with your idea.

    The first is that it requires configuration on the part of the user. This isn’t a showstopper in itself, but no-configuration solutions should be preferred.

    The second is that the whole point of phishing is that people mistakenly trust the source of the link they used. They are going to get very bored of clicking Look4Spoof every time they visit their bank, when they “know” it’s safe, so they just won’t bother.

    I would expect myself to distinguish something between maybe 16 to 256 colors (i.e., 4 ~ 8 bit)

    Even 256 colours might be enough with careful choice of hash algorithm. And it means that only 1 in 256 possible alternative domains would be usable, which is a big improvement.

    Still, perhaps we want to introduce a second variable. The trick is to do it without making the UI ugly. Three quick ideas:

    • Could we change the foreground text colour as well without the result looking like some bad 60s poster?
    • Could we make the background two-colour? Both these ideas raises the number of options to the power of 2, because then they’d both need to match. So 256^2, which is quite a lot.
    • Could we gradate the background between two colours? People’s eyes might be more sensitive to a gradation change.

    I don’t like any of these ideas really because I think they damage the simplicity. But perhaps they’ll be the springboard for someone else’s plan. It’s the old tradeoff between overloading the user with information and trying to solve the problem completely…

  11. I think this is a great idea.
    How about putting a pattern on the part of the url bar not obscured by text.

    This would provide a far greater recognition of differences than a flat colour. Eg cross hatching, or diagonals, changing the angles, and also curvy lines could sometimes be used, or circles, etc etc . This would not be too hard – it could even be generating svg out of the hash ;-)

  12. I was thinking along the same lines as rjw – some sort of shape based identifier. This would certainly be more accessible for colour blind users. Perhaps you could keep it simple, like a 3×3 grid with certain squares filled in; a sort of hash-generated favicon. A lot of potential combinations, but (hopefully) easy to recognise. If you married it up with the hash-generated colour…

    As far as the hash function goes, I wonder if it would be possible to tune it such that characters like “l” and “i” and “1” get pushed far apart in the generated hash value?

    Using the user’s history would also be good. You could create a distance function (like soundex or metaphone, as described above) and compare a site’s host name to previously visited sites. If they’re too close you could show a warning; no user configuration required. Again, the distance function could be tuned such that characters like “l” and “i” and “1” are considered close.

  13. trooper’s suggestion resembles mine, which is to have a special colour for URLs which match one or more of the user’s bookmarks. That way, authenticating the address is, for the user, merely a matter of bookmarking pages that they like.

    However, I don’t know what technical obstacles there are in the way, besides begging the question of when do two URLs represent the same site. For instance, you might want to trust (perhaps I shouldn’t say “trust”) only a few of the Web pages at geocities.com or the university server. And some, large commercial sites especially, have a Web presence sprawling over many domain names.

    The latter problem could look after itself – technically it would be false negative trust indication, but any site that you haven’t bookmarked will be false negative – and for the former, I’ll suggest that a match up to the last ‘/’ in the page address proper should be good.

    If they use JavaScript for everything, then of course you have to hunt them down and kill them – but you knew that anyway :-)

    Indication of address “approval” deserves consideration. It should be very noticeable without being distracting, and obviously shouldn’t be spoofable. Perhaps allow configuration options. Make the bookmark icon blink green for a recognised address, red for an unfamiliar one – but wait, there are monochrome screens, and colour-blind users. Play a different sound as the page loads.

    And perhaps different bookmark folders, or marks, should have different reactions. For instance, you could choose whether your bank sites’ URLs sound (and look) like a cash register, or an ATM, or a single dropped or spinning coin, while your porn memberships are… well, easily distinguished from the banks.

    Now, you should never trust a URL from an untrusted source, anyway, even if it looks okay. New Unicode technology allows near pixel-identical sp00fing of d0ma1n n@me5. (i.e., better than that. I came here from a discussion in opera.general.) And when the browser loads a spoofed page, you’ve already taken a step down the wrong road. (So, implement the check for links in the page. Animate the mouse pointer, or something.) But as long as your browser isn’t one that gets compromised as soon as you load a spoofed page, it isn’t such a worry.

  14. My concern with this would be how well it scales. Ok for a green paypal hash being different from the yellow paypai hash. But what if my bankco site is a hue of purple, ebay is cyan and my credit card hashes to a sandy colour.

    Having to communicate clearly and unambiguously to a novice user that purple would be OK for my bank but not credit card would in itself be an interesting task.

    Such combinations then make chances that a novice user might have a false sense of security when seeing a bogus cyan coloured match for the bank site because cyan was ‘safe’ for ebay.

  15. Gerv: “Still, perhaps we want to introduce a second variable. The trick is to do it without making the UI ugly. Three quick ideas:”

    How about a thick border? e.g. http://img22.exs.cx/img22/9268/spoof2ay.png

    For readability, I’d always use black text and restrict the background to colours of saturation>=180 and luminosity>=180 and the border to colours of saturation>=180 and luminosity>=120 (on 0 to 255 scales).

  16. Gavin: That’s a good point. I think the impact of the issue will be mitigated by the fact that most users will have at most two or three sites of high value. The vast majority of phishing scams are aimed at banks and PayPal. So they only have a couple of “safe” colours to remember.

    Also, remember an attacker has to find a fairly-close URL. Is an attacker going to find a fairly-close URL for a bank which happens also to have a fairly-close colour for PayPal, and then hope to catch confused people who are both customers of that bank and of Paypal? It’s a much less likely scenario.

    We can also mitigate it by the way we present it. If we always call it “the site colour”, and say “if a site’s site colour is different”, and don’t talk about actual colour values ever, then users will hopefully get the idea.

    As I’ve said, I don’t think my idea (or any idea) will solve the problem completely. But I think as a tradeoff of accuracy against complexity, it’s a good one.

    Greg: a thick border might work, although we need to be careful because there’s not much room in the status bar. Good idea, though.

  17. How about this idea. The reasons why phishing scams work cause people use certain sites. Obviously if the site is never visited, the user will know it’s a scam. Now, for the case when it is a known and used site for the user, then the user probably has visited the site. Then the site should be in the history or the dropdown list… So my suggestion is to set it to a certain color if the site has been visited.. just like for visited links on an HTML page..

  18. Gena01: I don’t think any history-based schemes would work, for two reasons. Firstly, the history has a defined horizon – often 30 days or 9 days. We can’t keep history forever – the tracking file would grow without bound. So the scheme won’t protect you if you haven’t visited your bank in that long. And it’s people who haven’t visited their bank in a while (and so many not be completely familiar with their site layout) who are more likely to get scammed.

    Secondly, some people clear their history regularly to keep their privacy.

  19. Gerv: What about password list? And yes i know that Firefox doesn’t save it for all the sites (yet?).

  20. IE has a rather interesting thing. It has “Trusted Sites” category. It shows the zone when you enter any website and if it’s on the trusted list it shows a green checkmark and shows “Trusted Zone” in the status bar.

    P.S. I actually saw some installers abuse that and add their domain to the Trusted List.

  21. Or another crazy idea. What about checking if the site you are visiting is on your Bookmarks List and showing it differently if it is?

  22. I do think an incremental approach is all that can be offered and everything should be considered – but colour matching does seem to be introducing a fairly unique ahd therefore unfamiliar tool. When you consider this is needed for users that still may not have grasped the basic dangers that the phishers employ.

    Anyway – you have got me thinking and I think some elements of what has been discussed could be incorporated in a way that uses more familiar alerting mechanisms.

    As you say this matters for mainly secure sites. These could be maintained in a known secure history indefinitely.

    Every time a new secure connection was made there could be information displayed along the lines of the bar used by pop-up blocking. Saying something like – This is not a site that you have been to before using this browser – if you think you have visited previously, there may be a problem click here.

    This would show an info dialog with the URL clearly displayed in a monospace font that would make clear any phishing differences and give basic info on what to look for.