-
The Best Science Writing Online 2012
Showcasing more than fifty of the most provocative, original, and significant online essays from 2011, The Best Science Writing Online 2012 will change the way...
Read More »
From Simons Science News (find original story here)
In 1997, when Massachusetts began making health records of state employees available to medical researchers, the government removed patients’ names, addresses, and Social Security numbers. William Weld, then the governor, assured the public that identifying individual patients in the records would be impossible.
Within days, an envelope from a graduate student at the Massachusetts Institute of Technology arrived at Weld’s office. It contained the governor’s health records.
Although the state had removed all obvious identifiers, it had left each patient’s date of birth, sex and ZIP code. By cross-referencing this information with voter-registration records, Latanya Sweeney was able to pinpoint Weld’s records.
Sweeney’s work, along with other notable privacy breaches over the past 15 years, has raised questions about the security of supposedly anonymous information.
“We’ve learned that human intuition about what is private is not especially good,” said Frank McSherry of Microsoft Research Silicon Valley in Mountain View, Calif. “Computers are getting more and more sophisticated at pulling individual data out of things that a naive person might think are harmless.”
As awareness of these privacy concerns has grown, many organizations have clamped down on their sensitive data, uncertain about what, if anything, they can release without jeopardizing the privacy of individuals. But this attention to privacy has come at a price, cutting researchers off from vast repositories of potentially invaluable data.
Medical records, like those released by Massachusetts, could help reveal which genes increase the risk of developing diseases like Alzheimer’s, how to reduce medical errors in hospitals or what treatments are most effective against breast cancer. Government-held information from Census Bureau surveys and tax returns could help economists devise policies that best promote income equality or economic growth. And data from social media websites like Facebook and Twitter could offer sociologists an unprecedented look at how ordinary people go about their lives.
The question is: How do we get at these data without revealing private information? A body of work a decade in the making is now starting to offer a genuine solution.
“Differential privacy,” as the approach is called, allows for the release of data while meeting a high standard for privacy protection. A differentially private data release algorithm allows researchers to ask practically any question about a database of sensitive information and provides answers that have been “blurred” so that they reveal virtually nothing about any individual’s data — not even whether the individual was in the database in the first place.
“The idea is that if you allow your data to be used, you incur no additional risk,” said Cynthia Dwork of Microsoft Research Silicon Valley. Dwork introduced the concept of differential privacy in 2005, along with McSherry, Kobbi Nissim of Israel’s Ben-Gurion University and Adam Smith of Pennsylvania State University.
Differential privacy preserves “plausible deniability,” as Avrim Blum of Carnegie Mellon University likes to put it. “If I want to pretend that my private information is different from what it really is, I can,” he said. “The output of a differentially private mechanism is going to be almost exactly the same whether it includes the real me or the pretend me, so I can plausibly deny anything I want.”





See what we're tweeting about



16 Comments
Add CommentWe all have a great deal to gain by allowing our medical data to be shared - perhaps we should also put a price on openness.
Reply | Report Abuse | Link to thisA real-life example of adding random errors to data is from the early GPS network days. The satellites mis-reported their position slightly, so that users on the ground ended up with slightly incorrect locations. Military users, however, got a code to decrypt the position error sent by the satellites, and were able to determine their location more exactly. This allowed the general public to use GPS, but in wartime prevented the enemy from having as good an accuracy as your own military.
Reply | Report Abuse | Link to thisThe random errors were changed frequently to prevent opponents from decrypting the corrections in enough time to be of use.
This is all well and good. Most odd is the amount of attention paid to being private.
Reply | Report Abuse | Link to thisHas anyone asked just what is so sacred about medical
conditions?
It is as though some overwhelming fear of exposure has
taken over.
Privacy, overdone, can lead to suspicion, often a heavier burden than exposure.
Do any of us want to be exposed to vagaries of chance as a consequence of 'privacy'?
Do any recent events ring a bell?
Privacy must have limits with regard to how it affects the rest of us.
@kienhua68: I do agree with you. Apart from the fear of a higher insurance policy if propensity to disease is revealed (which anyway can be legislated out by requirements on the insurance companies) what exactly are people trying keep private - this is a genuine question.
Reply | Report Abuse | Link to thisThe whole concept of privacy will soon become obsolete. As more and more data sources come on-line, most of which with no regard for privacy concerns, the private will cease to be... People are already on the verge of live streaming their whole lives, and not only their lives but also the fractions of life which they share with every other person. With the current development of big data tools, you will be able to know everything about anybody, using only the indiscretions of everybody else around them. The problem with privacy is asymmetry, it's me knowing something about you, while you know nothing about me. When everybody knows everything about everybody else, privacy will not only have disappeared but will have become irrelevant.
Reply | Report Abuse | Link to thisI recently completed a Uni group project addressing these systems.
Reply | Report Abuse | Link to thisMedical records can include unflattering or worse images of yourself.
This data along with other data can be extremely revealing, potentially including biometrics.
I want the anonymous data to be fully accessible to doctors and researchers. I also see great value in other data relating to consumption (not the TB kind) being made available to all through independent government agencies, as businesses always run better with the lights on.
A trustable system is capable of bringing us not just "real democracy", but "real time democracy", impervious to undemocratic influences.
The more identifying information that can be kept from online networks, the better. Only now are businesses waking up to the lack of security within their online databases, as some small businesses have their data encrypted by hackers who then extort them for access. I have never trusted an online computer with my data. Keeping important stuff offline is a good insurance policy.
People have private phone numbers, emails, social networks and so on. These practices can save a lot of hassle. If some people want to let it all hang out, that's fine, but if you find you have an audience, I think you will appreciate some curtains.
If the existence of Stuxnet isn't enough to concern you regarding the security of online computers, check out the research going on which is opening up the possibility of spying on offline machines through detection of electric fields generated by keyboards, mice, displays and their cables. Of course, wireless is a pushover to those in the know.
Some see the problems we are trying to resolve as trivial. I see an unhackable identity system as being the most important piece of infrastructure in our future society. The other main problem we must confront is ensuring that forced access for law enforcement is always limited by an independent judiciary that adheres to an expanded charter of human rights (current one falls well short).
For those of you thinking that privacy is over blown or irrelevant, please post your name, address, date of birth and social security number. I need a new couch and see no reason not to stick you with the bill. I can also target scams to your particular health issues or risks.
Reply | Report Abuse | Link to this@Radguy - Nice to see that some people have the ability to think things through. Thanks for pointing out the obvious to the foolish. I doubt that it will do any good but at the very least you tried.
as soon as conditions are bad enough you will see those obamacare chips become manditory , then any scanner tuned to the right frequency can pick up any and all your info . enjoy your privacy till then . what man can invent , man can circumvent .
Reply | Report Abuse | Link to thisIt's funny when people predicting the end of privacy -- or actually calling for it -- post their comments under a pseudonym. See, everyone needs their privacy at some level, even if it's just closing the toilet door.
Reply | Report Abuse | Link to thisLess trivially, real discrimination exists and probably always will, around gender, sexual orientation, ethnicity and religion. Knowing this, many people have a particular interest in keeping some of their medical history confidential. Who really needs to know that you had an STD, an abortion, a misdemeanor, or a depressive episode in your teens or twenties?
Frequently, people who see no need for privacy have done well in picking their parents. White middle class middle aged men may not have had the same life experiences as other sections of the community and tend to have a different perspective on privacy.
OK, a sensible comment; but STDs, abortion, misdemeanors and depressive episodes do not inform on those areas you have quoted as involving discrimination. Sure nobody "needs" to know these things (which do also happen to white middle-class people), but nobody is suggesting making such data freely available - we're just questioning whether hindering and slowing down medical research on important medical issues is worth it to efficiently hide such personal things?
Reply | Report Abuse | Link to thisThanks Allan. Yes, there are interesting compromises inherent in ethical medical research.
Reply | Report Abuse | Link to thisIn my grab-bag of points (intended to collectively show how complex this all is), I was responding to a few peoples' remarks, such as kienhua68: "Has anyone asked just what is so sacred about medical conditions?" and allophor: "When everybody knows everything about everybody else, privacy will not only have disappeared but will have become irrelevant. "
That's frank nonsense, a naive fatalistic confusion of what is and what ought to be. International data protection (or "information privacy") laws anticipate the leaking and collecting of Personally Identifiable Information (PII) via many channels, and provide for sanctions against unauthorised use of PII *no matter where it comes from*. allophor says of Big Data "you will be able to know everything about anybody, using only the indiscretions of everybody else around them". What s/he may not appreciate is that under OECD privacy law in 80+ countries, PII that falls into your lap via "indiscretions" or indeed any third party channel is still protected. So it is just not the case that Big Data destroys privacy. Confidentiality and anonymity are not the same thing as privacy. Even if a second party manages to find out something about me, they are not free to do anything they like with that PII. See also http://lockstep.com.au/blog/2012/10/29/not-too-late-for-privacy.
Returning to medical research, I agree there's a balancing act. Human rights considerations are central to all ethical review processes.
People get emotional about privacy on both sides of the medical research debate, but here's a pragmatic consideration from the middle ground: patients are known to self censor their medical histories, especially when they don't trust the healthcare people and systems they're exposed to. Patients are naturally shy about their HIV status, their recreational drug use, their mental illness past or present. If we want reasonably complete data about people in medical research, then it is in everyone's interests to provide the very best privacy protection, so that participants have the requisite trust and confidence in what's going to happen to their data. In this light it is just abhorrent to hear people preach that privacy is over, and to cast dispersions on those who would choose to keep medical secrets.
Your last point surprises me - I wasn't aware that people have any ability to edit their medical history, certainly here in the UK. I agree incomplete data is little use to any researcher. Perhaps "reasonable" or "good enough" privacy protection could be achieved more quickly and at less cost than your "very best privacy protection".
Reply | Report Abuse | Link to thisSorry, when I said 'self censor their medical histories', I was referring in the main to people holding details back when a medico takes an oral history. Patients simply don't tell their doctors everything, but they reveal more to doctors they trust. Which is why many countries have women's health clinics, and in Australia we have dedicated Aboriginal and Torres Strait Islander health services.
Reply | Report Abuse | Link to thisActual editing of databases is not unknown. In Australia our proposed longitudinal EHR system now in development will include the ability for patients to elect which parts of their record are visible to certain medicos. This is a controversial feature to be sure. On balance the Dept of Health here was persuaded by patients rights groups over the protestations of doctors' groups.
[Sidebar: This reminds me of a special case which shows how difficult it is to make generalisations about health privacy. When a patient is admitted to a hospital where their ex-spouse works as a medico, there are usually special protocols (often informal) to keep the two parties apart. In small towns, this is a real and present problem. I have done privacy consulting in these sorts of places. It is very difficult to codify the rules, and therefore tricky to develop reliable EHR access control algorithms.
One case I saw first hand involved a female nurse who had previously had a clandestine affair with a male doctor in a country hospital. Not many people knew about it. She happened to be admitted sometime after the affair ended. A Nursing Unit Manager (NUM) who knew the patient, knew she felt herself to be at risk as an in-patient, so the NUM took it upon herself to hide the charts on at least one occasion to keep the doctor from finding out. No matter what position observers may take on the ethical minefield of that case, it's clear I think that programming EHR Access Control rules to cope with the nuances is probably an intractable problem.]
Allan, you say that "incomplete data is little use to any researcher" but I wonder if that's a bit extreme? It's a reality that patients' self reporting is always a little suspect; all medical research protocols need to be designed in light of the way patients filter what they say. Incomplete data is of enormous benefit if the studies are well designed to cope with the gaps.
I'm pretty sure that volunteer rates in medical research would plummet if we promised only to apply "good enough privacy protection".
Reply | Report Abuse | Link to thisOk, whilst the validity of oral histories and patients' self-reporting is of concern to medical practitioners, the issue to researchers is the record of medical history in the database which should not be subject to such vagaries. It needs to be reasonably complete and accurate otherwise meaningful statistical processing becomes impossible. As to "good enough privacy protection", in the UK we have a Biobank of half a million volunteers for research purposes. I confess when I joined I simply assumed that privacy measures would be good enough - it is anonymised - and privacy does not feature at all prominently in the web discussions.
Reply | Report Abuse | Link to thisAllan wrote: "I simply assumed that [Biobank] privacy measures would be good enough - it is anonymised - and privacy does not feature at all prominently in the web discussions".
Reply | Report Abuse | Link to thisThe work of Latanya Sweeney's that kicks off this Sci Am article shows that EHR designers' claims of anonymisation need to be reviewed, and why privacy really should be discussed some more.