Location Services and Privacy

Mozilla is building a location service. This is a server to which you can send details of your (radio frequency) environment and it will respond with its best guess at where you are. The advantage of this is that it’s much quicker and more power efficient than spinning up GPS hardware (which takes a minimum of 30 seconds from cold). It can be fairly accurate; and even if it’s not, a rough location is much better than no location. For example, it lets you get a rough set of driving directions, and then reroute when your exact location is found. Due to AGPS, it also speeds up your GPS lock acquisition, getting you to an exact position faster. Lastly, GPS doesn’t work indoors, whereas this does. Google and other providers (such as Skyhook Wireless) already have such a service, and it’s built into e.g. the Android platform.

Such a service depends on having good data about what is being transmitted where, with what identifiers – mobile networks and WiFi, and maybe even Bluetooth. Our data set is created, at the moment, by people running MozStumbler, which records the RF environment, linked to actual positions obtained from GPS. Download, install and use the app today :-)

Now, as Mozilla, our initial impulse as an open organization would be to release all the raw collected data to the public so people can build awesome things we haven’t even thought of yet. However, it turns out that this data comes with some interesting privacy challenges. And I don’t only mean the privacy of the person stumbling or the person retrieving their location, I mean the privacy of the owners of the transmitting stations (e.g. WiFi access points or mobile phones acting as hotspots).

For example, let’s say someone moves a long way and doesn’t want to be found in their new location (for example, because they are escaping some sort of violence). If they take their wireless access point with them (and many non-technically-minded people might not think that was at all dangerous) then as soon as a stumbler drove by, a public database of raw data would reveal their new location to those who should not know it. Raw data may also contain the location of mobile phone hotspots (and therefore their owners). Other scenarios can be found, for the interested, in the bug report and security forum discussion.

The only way we know of so far to solve this is to tie bits of data together, such that you can only get a location when you submit, as part of your request, the IDs of two or more transmitting sources which the database already knows are close to each other – which means that you must be at that location. This is what Google’s location service does. The disadvantage of this is that if you are in an area with very little RF transmission around – e.g. just one access point, or just a mobile phone signal – the service can’t help you. The team experimented with hashing schemes to try and encode this restriction into a published data set, but we were unable to come up with anything workable. It means that basically, the data needs to be hidden behind an API which enforces this – which means we can’t publish the raw data.

There are other groups who are producing entirely open data of this type; it would be interesting to hear their views on such privacy questions. It would be good if we could make our data available to people who were willing to respect privacy constraints and encode those restrictions in their servers, and hopefully one day we’ll be able to do that. But at the moment, we can’t see any way to make the data set completely public :-( Interesting mathematical suggestions welcome…

9 thoughts on “Location Services and Privacy

  1. It would be helpful for MozStumbler volunteers if the apk was made available in a repository — easier to get, easier to install, easier to update and such. Google Play might not be a good choice for a number of reasons, and Amazon is the same, but F-Droid is well-known within the demographics who would use such a thing.

  2. Didnt Google get sued big time for gathering Wi-Fi data via their maps cars?

    how is this Mozilla project any different?

    • They got into trouble for recording actual data packets flowing backwards and forwards – which contained people’s personal data. I.e. they were sniffing data on the wireless networks concerned. This requires associating with the access point. Mozilla’s stumbler doesn’t do that – it uses the OS’s ability to enumerate the access points or mobile networks it can “see”, and records that information.

  3. It is possible to differentiate between infrastructure and personal networks?

    By infrastructure I mean transmitters set up by business for general public use. That would include mobile base stations, but possibly even commercial wifi access points and fm/dab radio stations.

  4. Honestly, all this sounds like paranoia to me and a good excuse not to share the data. If someone wants to hide when changing flat, I would first advise he changes his wifi router and SSID.
    With such an approach of not sharing data you are just making the business of commercial/closed databases and you are simply closing all sort of interesting research like coverage maps, operators, wifi density, interference/electrosmog study, manufacturers spread, etc.
    In this case, I’m sorry to say that I won’t contribute gathering data to this project anymore and I hope you won’t bother the other real open projects that have their databases opened.
    The fact that makes me think that it is a bad excuse is that they could already publish raw data with some of the data (ESSID and BSSID) scrambled but let locations/signal strenght opened and they can fully open the mobile operator cellid database. But the fact is they don’t so there’s another reason.

    • “If someone wants to hide when changing flat, I would first advise he changes his wifi router and SSID.”

      It’s all very well to say that, but many people are not technical and wouldn’t realise that their router broadcasts a unique ID which can be tracked and (would be) reported back to a central database by anyone passing.

      We realise there are lots of useful things people could do with this data, and I hope the team will look at a privacy-preserving data sharing agreement they could use to let others look at it and do some of these cool things.

      The lack of publication of mobile cellID data is, I am sure, simply due to lack of time. Why not file a bug about it?

      • That sounds like a great way of confusing people and breaking all their devices which have saved the password… Also, changing the SSID doesn’t change the MAC address.