The research team, led by Nils Homer, then a graduate student at the University of California at Los Angeles, showed that in many cases, if you know a person’s genome, you can figure out beyond a reasonable doubt whether that person has participated in a particular genome-wide test group. After Homer’s paper appeared, the National Institutes of Health reversed a policy, instituted earlier that year, that had required aggregate data from all NIH-funded genome-wide association studies to be posted publicly.
Perhaps even more surprisingly, researchers showed in 2011 that it is possible to glean personal information about purchases from Amazon.com’s product recommendation system, which makes aggregate-level statements of the form, “Customers who bought this item also bought A, B and C.” By observing how the recommendations changed over time and cross-referencing them with customers’ public reviews of purchased items, the researchers were able in several cases to infer that a particular customer had bought a particular item on a particular day — even before the customer had posted a review of the item.
In all these cases, the privacy measures that had been taken seemed adequate, until they were breached. But even as the list of privacy failures ballooned, a different approach to data release was in the making, one that came with an a priori privacy guarantee. To achieve this goal, researchers had gone back to basics: Just what does it mean, they asked, to protect privacy?
Two-World Privacy
If researchers study a health database and discover a link between smoking and some form of cancer, differential privacy will not protect a public smoker from being labeled with elevated cancer risk. But if a person’s smoking is a secret hidden in the database, differential privacy will protect that secret.
“’Differential’ refers to the difference between two worlds — one in which you allow your sensitive data to be included in the database and one in which you don’t,” McSherry said. The two worlds cannot be made to work out exactly the same, but they can be made close enough that they are effectively indistinguishable. That, he said, is the goal of differential privacy.
Differential privacy focuses on information-releasing algorithms, which take in questions about a database and spit out answers — not exact answers, but answers that have been randomly altered in a prescribed way. When the same question is asked of a pair of databases (A and B) that differ only with regard to a single individual (Person X), the algorithm should spit out essentially the same answers.
More precisely, given any answer that the algorithm could conceivably spit out, the probability of getting that answer should be almost exactly the same for both databases; that is, the ratio of these two probabilities should be bounded by some number R close to 1. The closer R is to 1, the more difficult it will be for an attacker to figure out whether he is getting information about database A or database B and the better protected Person X will be. After all, if the attacker can’t even figure out whether the information he is getting includes Person X’s data, he certainly can’t figure out what Person X’s data is.
(Differential privacy researchers usually prefer to speak in terms of the logarithm of R, which they denote Ɛ. This parameter puts a number on how much privacy leaks out when the algorithm is carried out: The closer Ɛ is to 0, the better the algorithm is at protecting privacy.)
To get a sense of how differentially private algorithms can be constructed, let’s look at one of the simplest such algorithms. It focuses on a scenario in which a questioner is limited to “counting queries”; for example: “How many people in the database have property P?”



See what we're tweeting about




16 Comments
Add CommentWe all have a great deal to gain by allowing our medical data to be shared - perhaps we should also put a price on openness.
Reply | Report Abuse | Link to thisA real-life example of adding random errors to data is from the early GPS network days. The satellites mis-reported their position slightly, so that users on the ground ended up with slightly incorrect locations. Military users, however, got a code to decrypt the position error sent by the satellites, and were able to determine their location more exactly. This allowed the general public to use GPS, but in wartime prevented the enemy from having as good an accuracy as your own military.
Reply | Report Abuse | Link to thisThe random errors were changed frequently to prevent opponents from decrypting the corrections in enough time to be of use.
This is all well and good. Most odd is the amount of attention paid to being private.
Reply | Report Abuse | Link to thisHas anyone asked just what is so sacred about medical
conditions?
It is as though some overwhelming fear of exposure has
taken over.
Privacy, overdone, can lead to suspicion, often a heavier burden than exposure.
Do any of us want to be exposed to vagaries of chance as a consequence of 'privacy'?
Do any recent events ring a bell?
Privacy must have limits with regard to how it affects the rest of us.
@kienhua68: I do agree with you. Apart from the fear of a higher insurance policy if propensity to disease is revealed (which anyway can be legislated out by requirements on the insurance companies) what exactly are people trying keep private - this is a genuine question.
Reply | Report Abuse | Link to thisThe whole concept of privacy will soon become obsolete. As more and more data sources come on-line, most of which with no regard for privacy concerns, the private will cease to be... People are already on the verge of live streaming their whole lives, and not only their lives but also the fractions of life which they share with every other person. With the current development of big data tools, you will be able to know everything about anybody, using only the indiscretions of everybody else around them. The problem with privacy is asymmetry, it's me knowing something about you, while you know nothing about me. When everybody knows everything about everybody else, privacy will not only have disappeared but will have become irrelevant.
Reply | Report Abuse | Link to thisI recently completed a Uni group project addressing these systems.
Reply | Report Abuse | Link to thisMedical records can include unflattering or worse images of yourself.
This data along with other data can be extremely revealing, potentially including biometrics.
I want the anonymous data to be fully accessible to doctors and researchers. I also see great value in other data relating to consumption (not the TB kind) being made available to all through independent government agencies, as businesses always run better with the lights on.
A trustable system is capable of bringing us not just "real democracy", but "real time democracy", impervious to undemocratic influences.
The more identifying information that can be kept from online networks, the better. Only now are businesses waking up to the lack of security within their online databases, as some small businesses have their data encrypted by hackers who then extort them for access. I have never trusted an online computer with my data. Keeping important stuff offline is a good insurance policy.
People have private phone numbers, emails, social networks and so on. These practices can save a lot of hassle. If some people want to let it all hang out, that's fine, but if you find you have an audience, I think you will appreciate some curtains.
If the existence of Stuxnet isn't enough to concern you regarding the security of online computers, check out the research going on which is opening up the possibility of spying on offline machines through detection of electric fields generated by keyboards, mice, displays and their cables. Of course, wireless is a pushover to those in the know.
Some see the problems we are trying to resolve as trivial. I see an unhackable identity system as being the most important piece of infrastructure in our future society. The other main problem we must confront is ensuring that forced access for law enforcement is always limited by an independent judiciary that adheres to an expanded charter of human rights (current one falls well short).
For those of you thinking that privacy is over blown or irrelevant, please post your name, address, date of birth and social security number. I need a new couch and see no reason not to stick you with the bill. I can also target scams to your particular health issues or risks.
Reply | Report Abuse | Link to this@Radguy - Nice to see that some people have the ability to think things through. Thanks for pointing out the obvious to the foolish. I doubt that it will do any good but at the very least you tried.
as soon as conditions are bad enough you will see those obamacare chips become manditory , then any scanner tuned to the right frequency can pick up any and all your info . enjoy your privacy till then . what man can invent , man can circumvent .
Reply | Report Abuse | Link to thisIt's funny when people predicting the end of privacy -- or actually calling for it -- post their comments under a pseudonym. See, everyone needs their privacy at some level, even if it's just closing the toilet door.
Reply | Report Abuse | Link to thisLess trivially, real discrimination exists and probably always will, around gender, sexual orientation, ethnicity and religion. Knowing this, many people have a particular interest in keeping some of their medical history confidential. Who really needs to know that you had an STD, an abortion, a misdemeanor, or a depressive episode in your teens or twenties?
Frequently, people who see no need for privacy have done well in picking their parents. White middle class middle aged men may not have had the same life experiences as other sections of the community and tend to have a different perspective on privacy.
OK, a sensible comment; but STDs, abortion, misdemeanors and depressive episodes do not inform on those areas you have quoted as involving discrimination. Sure nobody "needs" to know these things (which do also happen to white middle-class people), but nobody is suggesting making such data freely available - we're just questioning whether hindering and slowing down medical research on important medical issues is worth it to efficiently hide such personal things?
Reply | Report Abuse | Link to thisThanks Allan. Yes, there are interesting compromises inherent in ethical medical research.
Reply | Report Abuse | Link to thisIn my grab-bag of points (intended to collectively show how complex this all is), I was responding to a few peoples' remarks, such as kienhua68: "Has anyone asked just what is so sacred about medical conditions?" and allophor: "When everybody knows everything about everybody else, privacy will not only have disappeared but will have become irrelevant. "
That's frank nonsense, a naive fatalistic confusion of what is and what ought to be. International data protection (or "information privacy") laws anticipate the leaking and collecting of Personally Identifiable Information (PII) via many channels, and provide for sanctions against unauthorised use of PII *no matter where it comes from*. allophor says of Big Data "you will be able to know everything about anybody, using only the indiscretions of everybody else around them". What s/he may not appreciate is that under OECD privacy law in 80+ countries, PII that falls into your lap via "indiscretions" or indeed any third party channel is still protected. So it is just not the case that Big Data destroys privacy. Confidentiality and anonymity are not the same thing as privacy. Even if a second party manages to find out something about me, they are not free to do anything they like with that PII. See also http://lockstep.com.au/blog/2012/10/29/not-too-late-for-privacy.
Returning to medical research, I agree there's a balancing act. Human rights considerations are central to all ethical review processes.
People get emotional about privacy on both sides of the medical research debate, but here's a pragmatic consideration from the middle ground: patients are known to self censor their medical histories, especially when they don't trust the healthcare people and systems they're exposed to. Patients are naturally shy about their HIV status, their recreational drug use, their mental illness past or present. If we want reasonably complete data about people in medical research, then it is in everyone's interests to provide the very best privacy protection, so that participants have the requisite trust and confidence in what's going to happen to their data. In this light it is just abhorrent to hear people preach that privacy is over, and to cast dispersions on those who would choose to keep medical secrets.
Your last point surprises me - I wasn't aware that people have any ability to edit their medical history, certainly here in the UK. I agree incomplete data is little use to any researcher. Perhaps "reasonable" or "good enough" privacy protection could be achieved more quickly and at less cost than your "very best privacy protection".
Reply | Report Abuse | Link to thisSorry, when I said 'self censor their medical histories', I was referring in the main to people holding details back when a medico takes an oral history. Patients simply don't tell their doctors everything, but they reveal more to doctors they trust. Which is why many countries have women's health clinics, and in Australia we have dedicated Aboriginal and Torres Strait Islander health services.
Reply | Report Abuse | Link to thisActual editing of databases is not unknown. In Australia our proposed longitudinal EHR system now in development will include the ability for patients to elect which parts of their record are visible to certain medicos. This is a controversial feature to be sure. On balance the Dept of Health here was persuaded by patients rights groups over the protestations of doctors' groups.
[Sidebar: This reminds me of a special case which shows how difficult it is to make generalisations about health privacy. When a patient is admitted to a hospital where their ex-spouse works as a medico, there are usually special protocols (often informal) to keep the two parties apart. In small towns, this is a real and present problem. I have done privacy consulting in these sorts of places. It is very difficult to codify the rules, and therefore tricky to develop reliable EHR access control algorithms.
One case I saw first hand involved a female nurse who had previously had a clandestine affair with a male doctor in a country hospital. Not many people knew about it. She happened to be admitted sometime after the affair ended. A Nursing Unit Manager (NUM) who knew the patient, knew she felt herself to be at risk as an in-patient, so the NUM took it upon herself to hide the charts on at least one occasion to keep the doctor from finding out. No matter what position observers may take on the ethical minefield of that case, it's clear I think that programming EHR Access Control rules to cope with the nuances is probably an intractable problem.]
Allan, you say that "incomplete data is little use to any researcher" but I wonder if that's a bit extreme? It's a reality that patients' self reporting is always a little suspect; all medical research protocols need to be designed in light of the way patients filter what they say. Incomplete data is of enormous benefit if the studies are well designed to cope with the gaps.
I'm pretty sure that volunteer rates in medical research would plummet if we promised only to apply "good enough privacy protection".
Reply | Report Abuse | Link to thisOk, whilst the validity of oral histories and patients' self-reporting is of concern to medical practitioners, the issue to researchers is the record of medical history in the database which should not be subject to such vagaries. It needs to be reasonably complete and accurate otherwise meaningful statistical processing becomes impossible. As to "good enough privacy protection", in the UK we have a Biobank of half a million volunteers for research purposes. I confess when I joined I simply assumed that privacy measures would be good enough - it is anonymised - and privacy does not feature at all prominently in the web discussions.
Reply | Report Abuse | Link to thisAllan wrote: "I simply assumed that [Biobank] privacy measures would be good enough - it is anonymised - and privacy does not feature at all prominently in the web discussions".
Reply | Report Abuse | Link to thisThe work of Latanya Sweeney's that kicks off this Sci Am article shows that EHR designers' claims of anonymisation need to be reviewed, and why privacy really should be discussed some more.