Mozilla is building a location service. This is a server to which you can send details of your (radio frequency) environment and it will respond with its best guess at where you are. The advantage of this is that it’s much quicker and more power efficient than spinning up GPS hardware (which takes a minimum of 30 seconds from cold). It can be fairly accurate; and even if it’s not, a rough location is much better than no location. For example, it lets you get a rough set of driving directions, and then reroute when your exact location is found. Due to AGPS, it also speeds up your GPS lock acquisition, getting you to an exact position faster. Lastly, GPS doesn’t work indoors, whereas this does. Google and other providers (such as Skyhook Wireless) already have such a service, and it’s built into e.g. the Android platform.
Such a service depends on having good data about what is being transmitted where, with what identifiers – mobile networks and WiFi, and maybe even Bluetooth. Our data set is created, at the moment, by people running MozStumbler, which records the RF environment, linked to actual positions obtained from GPS. Download, install and use the app today :-)
Now, as Mozilla, our initial impulse as an open organization would be to release all the raw collected data to the public so people can build awesome things we haven’t even thought of yet. However, it turns out that this data comes with some interesting privacy challenges. And I don’t only mean the privacy of the person stumbling or the person retrieving their location, I mean the privacy of the owners of the transmitting stations (e.g. WiFi access points or mobile phones acting as hotspots).
For example, let’s say someone moves a long way and doesn’t want to be found in their new location (for example, because they are escaping some sort of violence). If they take their wireless access point with them (and many non-technically-minded people might not think that was at all dangerous) then as soon as a stumbler drove by, a public database of raw data would reveal their new location to those who should not know it. Raw data may also contain the location of mobile phone hotspots (and therefore their owners). Other scenarios can be found, for the interested, in the bug report and security forum discussion.
The only way we know of so far to solve this is to tie bits of data together, such that you can only get a location when you submit, as part of your request, the IDs of two or more transmitting sources which the database already knows are close to each other – which means that you must be at that location. This is what Google’s location service does. The disadvantage of this is that if you are in an area with very little RF transmission around – e.g. just one access point, or just a mobile phone signal – the service can’t help you. The team experimented with hashing schemes to try and encode this restriction into a published data set, but we were unable to come up with anything workable. It means that basically, the data needs to be hidden behind an API which enforces this – which means we can’t publish the raw data.
There are other groups who are producing entirely open data of this type; it would be interesting to hear their views on such privacy questions. It would be good if we could make our data available to people who were willing to respect privacy constraints and encode those restrictions in their servers, and hopefully one day we’ll be able to do that. But at the moment, we can’t see any way to make the data set completely public :-( Interesting mathematical suggestions welcome…