Privacy by the Numbers: A New Approach to Safeguarding Data

A mathematical technique called “differential privacy” gives researchers access to vast repositories of personal data while meeting a high standard for privacy protection















Share on Tumblr

In other words, the naive approach of adding Laplace noise to each question independently is limited in terms of the number of questions to which it can provide useful answers. To deal with this, computer scientists have developed an arsenal of more powerful primitives — algorithmic building blocks which, by taking into account the particular structure of a database and problem type, can answer more questions with more accuracy than the naive approach can.

For example, In 2005, Smith noticed that the baby names problem has a special structure: removing one person’s personal information from the database changes the answer for only one of the 10,000 names in the database. Because of this attribute, we can get away with adding only 1/Ɛ in Laplace noise to each name answer, instead of 10,000/Ɛ, and the outcome will stay within our Ɛ privacy budget. This algorithm is a primitive that can be applied to any “histogram” query — that is, one asking how many people fall into each of several mutually exclusive categories, such as first names.

When Smith told Dwork about this insight in the early days of differential privacy research, “something inside me went, ‘Wow!’” Dwork said. “I realized that we could exploit the structure of a query or computation to get much greater accuracy than I had realized.”

Since that time, computer scientists have developed a large library of such primitives. And because the additive rule explains what happens to the privacy parameter when algorithms are combined, computer scientists can assemble these building blocks into complex structures while keeping tabs on just how much privacy the resulting algorithms use up.

“One of the achievements in this area has been to come up with algorithms that can handle a very large number of queries with a relatively small amount of noise,” said Moritz Hardt of IBM Research Almadenin San Jose, Calif.

To make differential privacy more accessible to nonexperts, several groups are working to create a differential privacy programming language that would abstract away all the underlying mathematics of the algorithmic primitives to a layer that the user doesn’t have to think about.

“If you’re the curator of a dataset, you don’t have to worry about what people are doing with your dataset as long as they are running queries written in this language,” said McSherry, who has created one preliminary such language, calledPINQ. “The program serves as a proof that the query is OK.”

A Nonrenewable Resource
Because the simple additive Ɛ rule gives a precise upper limit on how much total privacy you lose when the various databases you belong to release information in a differentially private way, the additive rule turns privacy into a “fungible currency,” McSherry said.

For example, if you were to decide how much total lifetime privacy loss would be acceptable to you, you could then decide how you want to “spend” it — whether in exchange for money, perhaps, or to support a research project you admire. Each time you allowed your data to be used in a differentially private data release, you would know exactly how much of your privacy budget remained.

Likewise, the curator of a dataset of sensitive information could decide how to spend whatever amount of privacy she had decided to release — perhaps by inviting proposals for research projects that would describe not only what questions the researchers wanted to ask and why, but also how much privacy the project would use up. The curator could then decide which projects would make the most worthwhile use of the dataset’s predetermined privacy budget. Once this budget had been used up, the dataset could be closed to further study.



16 Comments

Add Comment
View
  1. 1. AllanRBrewer 01:50 PM 12/31/12

    We all have a great deal to gain by allowing our medical data to be shared - perhaps we should also put a price on openness.

    Reply | Report Abuse | Link to this
  2. 2. DaniEder 02:47 PM 12/31/12

    A real-life example of adding random errors to data is from the early GPS network days. The satellites mis-reported their position slightly, so that users on the ground ended up with slightly incorrect locations. Military users, however, got a code to decrypt the position error sent by the satellites, and were able to determine their location more exactly. This allowed the general public to use GPS, but in wartime prevented the enemy from having as good an accuracy as your own military.

    The random errors were changed frequently to prevent opponents from decrypting the corrections in enough time to be of use.

    Reply | Report Abuse | Link to this
  3. 3. kienhua68 03:45 PM 1/1/13

    This is all well and good. Most odd is the amount of attention paid to being private.
    Has anyone asked just what is so sacred about medical
    conditions?
    It is as though some overwhelming fear of exposure has
    taken over.
    Privacy, overdone, can lead to suspicion, often a heavier burden than exposure.
    Do any of us want to be exposed to vagaries of chance as a consequence of 'privacy'?
    Do any recent events ring a bell?
    Privacy must have limits with regard to how it affects the rest of us.

    Reply | Report Abuse | Link to this
  4. 4. AllanRBrewer in reply to kienhua68 04:02 PM 1/1/13

    @kienhua68: I do agree with you. Apart from the fear of a higher insurance policy if propensity to disease is revealed (which anyway can be legislated out by requirements on the insurance companies) what exactly are people trying keep private - this is a genuine question.

    Reply | Report Abuse | Link to this
  5. 5. allaphor 06:51 AM 1/2/13

    The whole concept of privacy will soon become obsolete. As more and more data sources come on-line, most of which with no regard for privacy concerns, the private will cease to be... People are already on the verge of live streaming their whole lives, and not only their lives but also the fractions of life which they share with every other person. With the current development of big data tools, you will be able to know everything about anybody, using only the indiscretions of everybody else around them. The problem with privacy is asymmetry, it's me knowing something about you, while you know nothing about me. When everybody knows everything about everybody else, privacy will not only have disappeared but will have become irrelevant.

    Reply | Report Abuse | Link to this
  6. 6. Radguy 11:07 AM 1/3/13

    I recently completed a Uni group project addressing these systems.

    Medical records can include unflattering or worse images of yourself.

    This data along with other data can be extremely revealing, potentially including biometrics.

    I want the anonymous data to be fully accessible to doctors and researchers. I also see great value in other data relating to consumption (not the TB kind) being made available to all through independent government agencies, as businesses always run better with the lights on.

    A trustable system is capable of bringing us not just "real democracy", but "real time democracy", impervious to undemocratic influences.

    The more identifying information that can be kept from online networks, the better. Only now are businesses waking up to the lack of security within their online databases, as some small businesses have their data encrypted by hackers who then extort them for access. I have never trusted an online computer with my data. Keeping important stuff offline is a good insurance policy.

    People have private phone numbers, emails, social networks and so on. These practices can save a lot of hassle. If some people want to let it all hang out, that's fine, but if you find you have an audience, I think you will appreciate some curtains.

    If the existence of Stuxnet isn't enough to concern you regarding the security of online computers, check out the research going on which is opening up the possibility of spying on offline machines through detection of electric fields generated by keyboards, mice, displays and their cables. Of course, wireless is a pushover to those in the know.

    Some see the problems we are trying to resolve as trivial. I see an unhackable identity system as being the most important piece of infrastructure in our future society. The other main problem we must confront is ensuring that forced access for law enforcement is always limited by an independent judiciary that adheres to an expanded charter of human rights (current one falls well short).

    Reply | Report Abuse | Link to this
  7. 7. bucketofsquid 11:45 AM 1/3/13

    For those of you thinking that privacy is over blown or irrelevant, please post your name, address, date of birth and social security number. I need a new couch and see no reason not to stick you with the bill. I can also target scams to your particular health issues or risks.

    @Radguy - Nice to see that some people have the ability to think things through. Thanks for pointing out the obvious to the foolish. I doubt that it will do any good but at the very least you tried.

    Reply | Report Abuse | Link to this
  8. 8. justyntoo 04:22 PM 1/5/13

    as soon as conditions are bad enough you will see those obamacare chips become manditory , then any scanner tuned to the right frequency can pick up any and all your info . enjoy your privacy till then . what man can invent , man can circumvent .

    Reply | Report Abuse | Link to this
  9. 9. StephenWilson 03:01 PM 1/6/13

    It's funny when people predicting the end of privacy -- or actually calling for it -- post their comments under a pseudonym. See, everyone needs their privacy at some level, even if it's just closing the toilet door.
    Less trivially, real discrimination exists and probably always will, around gender, sexual orientation, ethnicity and religion. Knowing this, many people have a particular interest in keeping some of their medical history confidential. Who really needs to know that you had an STD, an abortion, a misdemeanor, or a depressive episode in your teens or twenties?
    Frequently, people who see no need for privacy have done well in picking their parents. White middle class middle aged men may not have had the same life experiences as other sections of the community and tend to have a different perspective on privacy.

    Reply | Report Abuse | Link to this
  10. 10. AllanRBrewer in reply to StephenWilson 04:20 PM 1/6/13

    OK, a sensible comment; but STDs, abortion, misdemeanors and depressive episodes do not inform on those areas you have quoted as involving discrimination. Sure nobody "needs" to know these things (which do also happen to white middle-class people), but nobody is suggesting making such data freely available - we're just questioning whether hindering and slowing down medical research on important medical issues is worth it to efficiently hide such personal things?

    Reply | Report Abuse | Link to this
  11. 11. StephenWilson 05:07 PM 1/6/13

    Thanks Allan. Yes, there are interesting compromises inherent in ethical medical research.
    In my grab-bag of points (intended to collectively show how complex this all is), I was responding to a few peoples' remarks, such as kienhua68: "Has anyone asked just what is so sacred about medical conditions?" and allophor: "When everybody knows everything about everybody else, privacy will not only have disappeared but will have become irrelevant. "
    That's frank nonsense, a naive fatalistic confusion of what is and what ought to be. International data protection (or "information privacy") laws anticipate the leaking and collecting of Personally Identifiable Information (PII) via many channels, and provide for sanctions against unauthorised use of PII *no matter where it comes from*. allophor says of Big Data "you will be able to know everything about anybody, using only the indiscretions of everybody else around them". What s/he may not appreciate is that under OECD privacy law in 80+ countries, PII that falls into your lap via "indiscretions" or indeed any third party channel is still protected. So it is just not the case that Big Data destroys privacy. Confidentiality and anonymity are not the same thing as privacy. Even if a second party manages to find out something about me, they are not free to do anything they like with that PII. See also http://lockstep.com.au/blog/2012/10/29/not-too-late-for-privacy.
    Returning to medical research, I agree there's a balancing act. Human rights considerations are central to all ethical review processes.
    People get emotional about privacy on both sides of the medical research debate, but here's a pragmatic consideration from the middle ground: patients are known to self censor their medical histories, especially when they don't trust the healthcare people and systems they're exposed to. Patients are naturally shy about their HIV status, their recreational drug use, their mental illness past or present. If we want reasonably complete data about people in medical research, then it is in everyone's interests to provide the very best privacy protection, so that participants have the requisite trust and confidence in what's going to happen to their data. In this light it is just abhorrent to hear people preach that privacy is over, and to cast dispersions on those who would choose to keep medical secrets.

    Reply | Report Abuse | Link to this
  12. 12. AllanRBrewer 05:42 PM 1/6/13

    Your last point surprises me - I wasn't aware that people have any ability to edit their medical history, certainly here in the UK. I agree incomplete data is little use to any researcher. Perhaps "reasonable" or "good enough" privacy protection could be achieved more quickly and at less cost than your "very best privacy protection".

    Reply | Report Abuse | Link to this
  13. 13. StephenWilson in reply to AllanRBrewer 10:51 PM 1/6/13

    Sorry, when I said 'self censor their medical histories', I was referring in the main to people holding details back when a medico takes an oral history. Patients simply don't tell their doctors everything, but they reveal more to doctors they trust. Which is why many countries have women's health clinics, and in Australia we have dedicated Aboriginal and Torres Strait Islander health services.
    Actual editing of databases is not unknown. In Australia our proposed longitudinal EHR system now in development will include the ability for patients to elect which parts of their record are visible to certain medicos. This is a controversial feature to be sure. On balance the Dept of Health here was persuaded by patients rights groups over the protestations of doctors' groups.
    [Sidebar: This reminds me of a special case which shows how difficult it is to make generalisations about health privacy. When a patient is admitted to a hospital where their ex-spouse works as a medico, there are usually special protocols (often informal) to keep the two parties apart. In small towns, this is a real and present problem. I have done privacy consulting in these sorts of places. It is very difficult to codify the rules, and therefore tricky to develop reliable EHR access control algorithms.
    One case I saw first hand involved a female nurse who had previously had a clandestine affair with a male doctor in a country hospital. Not many people knew about it. She happened to be admitted sometime after the affair ended. A Nursing Unit Manager (NUM) who knew the patient, knew she felt herself to be at risk as an in-patient, so the NUM took it upon herself to hide the charts on at least one occasion to keep the doctor from finding out. No matter what position observers may take on the ethical minefield of that case, it's clear I think that programming EHR Access Control rules to cope with the nuances is probably an intractable problem.]
    Allan, you say that "incomplete data is little use to any researcher" but I wonder if that's a bit extreme? It's a reality that patients' self reporting is always a little suspect; all medical research protocols need to be designed in light of the way patients filter what they say. Incomplete data is of enormous benefit if the studies are well designed to cope with the gaps.

    Reply | Report Abuse | Link to this
  14. 14. StephenWilson in reply to AllanRBrewer 11:11 PM 1/6/13

    I'm pretty sure that volunteer rates in medical research would plummet if we promised only to apply "good enough privacy protection".

    Reply | Report Abuse | Link to this
  15. 15. AllanRBrewer in reply to StephenWilson 04:47 PM 1/7/13

    Ok, whilst the validity of oral histories and patients' self-reporting is of concern to medical practitioners, the issue to researchers is the record of medical history in the database which should not be subject to such vagaries. It needs to be reasonably complete and accurate otherwise meaningful statistical processing becomes impossible. As to "good enough privacy protection", in the UK we have a Biobank of half a million volunteers for research purposes. I confess when I joined I simply assumed that privacy measures would be good enough - it is anonymised - and privacy does not feature at all prominently in the web discussions.

    Reply | Report Abuse | Link to this
  16. 16. StephenWilson in reply to AllanRBrewer 01:38 AM 1/8/13

    Allan wrote: "I simply assumed that [Biobank] privacy measures would be good enough - it is anonymised - and privacy does not feature at all prominently in the web discussions".
    The work of Latanya Sweeney's that kicks off this Sci Am article shows that EHR designers' claims of anonymisation need to be reviewed, and why privacy really should be discussed some more.

    Reply | Report Abuse | Link to this
Leave this field empty

Add a Comment

You must sign in or register as a ScientificAmerican.com member to submit a comment.
Click one of the buttons below to register using an existing Social Account.

More from Scientific American

See what we're tweeting about

Scientific American Editors

More »

Free Newsletters


Get the best from Scientific American in your inbox

Solve Innovation Challenges

Powered By: Innocentive

  SA Digital
  SA Digital

Science Jobs of the Week

Email this Article

Privacy by the Numbers: A New Approach to Safeguarding Data

X
Scientific American Magazine

Subscribe Today

Save 66% off the cover price and get a free gift!

Learn More >>

X

Please Log In

Forgot: Password

X

Account Linking

Welcome, . Do you have an existing ScientificAmerican.com account?

Yes, please link my existing account with for quick, secure access.



Forgot Password?

No, I would like to create a new account with my profile information.

Create Account
X

Report Abuse

Are you sure?

X

Institutional Access

It has been identified that the institution you are trying to access this article from has institutional site license access to Scientific American on nature.com. To access this article in its entirety through site license access, click below.

Site license access
X

Error

X

Share this Article

X