We need to talk about identifiers

Our PI, @mcphoo, raised the issue of tracking bluetooth MAC addresses last week. The debate over whether these IDs – the hardware identifiers that are burnt into the networking hardware in our smartphones, laptops and other devices – are personal identifiers, is ongoing. On the one hand are those that claim these are just hardware IDs, that they don’t identify people, just devices. On the other are people who claim that the links between the device and the individual are strong enough that by tracking the device you’re actually tracking a person. I fall firmly into the latter. Interestingly, the Information Commissioner does not. Quelle surprise.

To properly explain my own position, it’s necessary to unpack what we mean by “identify”. Broadly, identification is about differentiating one thing from another thing. An identity is a collection of properties about something that can be identified. An identifier is a piece of information that sets one individual apart from others. An identifier could be completely unique like a passport number (at least the long one on the bottom), or a fairly uncommon piece of data like a name. Non-unique identifiers don’t identify globally, but in a particular context (or combined with other pieces of data) they are identifying.

Immediately, we have two classes of identifier – analogous to the URLs and URIs of the web – those that allow us to find an individual and those that allow us to simply recognise them. As an intuitive example, a home address allows us to find an individual, physically. A phone number or email address facilitates communication and so, in a sense, lets us find their owner. What about a photograph of someone’s face, or a copy of their fingerprint, though? Armed with these pieces of information we could recognise a person if they presented themselves to us, but we’d be hard pressed to go and find that person except in quite limited contexts.

In reality, no piece of information is inherently identifying, they all depend to some extent on a broader context. Phone numbers identify because they’re built on a global telecomms infrastructure; photographs identify because we can compare a photograph to what we see when we look at someone; even latitude and longitude of a person’s current location is only identifying in the context of an agreed standard for naming points on the surface of the earth. The extent to which something is identifying is, therefore, largely determined by the uniqueness of the data, and the availability of the directories, databases and other information sources that are necessary to actually perform the identification.

With that in mind, a device MAC address is identifying in much the same way as a person’s fingerprint. Absent of a database of fingerprints, a fingerprint only allows recognition, and that’s (currently) true of MAC addresses, too. Given your bluetooth MAC address I can’t go and find you, or even email you, but if you walk into my home I can tell if you’re the person the address ‘belongs’ to. Which brings us to the second question, the extent to which a MAC address is related to a particular person – does it ‘belong’ to them in a meaningful sense? Not by design, and not when the mac address is created. Unlike a fingerprint, which is born with, and dies with, a particular person, a mac address is created for a device. Until the device is purchased and starts routinely sitting in a pocket that address only relates to the device. But once it does start sitting in a pocket, it typically sits there every time we leave the house. Our smartphones accompany us to work, to the supermarket, on the street and on holiday. About the only place you’d have a hard time finding out the MAC address associated with someone is in a swimming pool.

Recognition of bluetooth devices, and hence their owners, is trivial. It’s not a secure identifier in the same way as a fingerprint – it would be stupid to unlock a bank vault just because a particular MAC address was in range – but from a pragmatic point of view it is a viable and low-noise way to correlate an observation of a person in one location with a later observation of the same person in another location. What’s more, unlike fingerprints or face recognition MAC address detection is both physically and computationally practical to do on a large scale, with high accuracy, with little (if any) co-operation from the people that you want to track.

The fact that mac addresses are a good proxy for identifying humans is precisely why they’re useful for seeing how long those humans are spending in a queue, or for detecting the routes shoppers take through a store.

The real question in scenarios that are measuring human activity is not whether particular data points are personally identifiable or not – most of them are given the correct context – but whether the collection and processing is justified, whether it is fair, and whether the data subject has a chance to opt-out. Empowered citizens deserve to understand when they’re being monitored, and to understand how to exercise their right to choose whether to take part. Privacy is not just about data, it’s about the purpose for which it’s being collected, the person who’s collecting it and the subject’s own unique concerns, context and circumstances. Denying us the choice to decide if we want to be tracked in a queue, or around a store, or as we go about our lives on the grounds that the data you’re collecting isn’t technically about a person misses the bigger picture.