A Hacked Database Prompts Debate about Genetic Privacy

Experts urge transparency and new regulations to protect DNA donors















Share on Tumblr



Image: Flickr/Steve Jurvetson

Linking a human genome in an anonymous sequencing database to its real-world counterpart wasn’t supposed to be possible.

Yaniv Erlich, a geneticist at the Massachusetts Institute of Technology’s Whitehead Institute for Biomedical Research, apparently never got the memo. In the end all it took him and M.I.T. undergraduate student Melissa Gymrek to decipher the identity of 50 individuals whose DNA is available online in free-access databases was a computer and an Internet connection.

Erlich and Gymrek selected 32 male genomes from the 1000 Genomes Project, which has a publicly accessible database designed to help researchers find genes associated with different human diseases. Next, Erlich and Gymrek used an algorithm to extract genetic markers from the DNA sequences. The algorithm is specially designed to hone in on short tandem repeats on a man’s Y chromosome. Y-STRs are passed patrilineally with little to no change from one generation to the next. They provide a way to link an anonymous genome to a particular family surname.

Using meta-data about the anonymous genomes included in the database, the researchers narrowed the field of possible DNA matches down to 10,000 men of a particular age who resided in Utah when they donated their DNA. Erlich and Gymrek then plugged the genomes into two of the Web’s most popular genealogy sites, Ysearch and SMGF. These recreational sites provide free access to databases that connect Y-STR markers to surnames. The researchers found that eight of their samples strongly matched the surnames of Mormon families in Utah. Erlich and Gymrek’s findings were published in the January 17 Science.

The results show that a curious party equipped with open-access information can not only tie a three-billion-digit-long genome directly to an individual, but also can use bits and pieces of that same DNA to identify distant relatives, male or female, of the original genetic donor. “If your fourth cousin participated in this database, we could use it to find out about your ancestry,” Erlich says.

Whereas privacy concerns about publicly accessible genome data have cropped up in the past with genealogy databases, this is the first time that anyone has connected an anonymous DNA sequence to its donor without donor DNA as a reference.

Genome mining could have serious consequences for DNA donors. Under federal law health insurance companies cannot use genetic data, but there is currently nothing barring companies from using a person’s genome to define life insurance policies or determine long-term disability care. The new research prompted the National Institutes of Health (NIH) to hide people’s ages from federally funded genetic databases such as the 1000 Genomes Project that allow open access to scientists.

Yet the NIH’s strategy may be missing the point, says Lawrence Gostin, a professor of medicine at Georgetown University and director of the World Health Organization’s Collaborating Center on Public Health Law and Human Rights. “This is not a long-term solution to the problem because in reality there is nothing more personally identifiable than your genome,” he says.

Although only talented geneticists would be able to hack a genome like Erlich did, as computing gets more sophisticated and more data becomes available, the prospect becomes more likely.

Open-access genetic databases make big contributions to medical research, Erlich says. Only through studying diverse groups of individuals can scientists detect DNA variants that affect a person’s susceptibility to medical conditions like heart disease and diabetes. Erlich says identifying characteristics such as hair and eye color, facial features and age greatly contribute to how useful this data is. He says ensuring complete privacy means limiting the use of information that might be used to identify a subject.



2 Comments

Add Comment
View
  1. 1. tonyphelps 08:21 AM 2/6/13

    Its challenging to truly get informed consent about the risks of providing genetic material and information to a study. Generally, its the partners and service providers that are the source of many leaks - often three of four steps away from the original interaction.

    Reply | Report Abuse | Link to this
  2. 2. StephenWilson 02:27 PM 2/10/13

    Does this mean privacy is dead? Or even deader than we feared before?

    No! The thing that so many observers are missing is that international privacy law already provides a mechanism to control re-identification of anonymous data. These laws have been applied forcefully in Europe to shut down Facebook's facial recognition feature and make them destroy their templates.

    Said mechanism is the Collection Limitation Principle: a business or government must not collect Personally Identifiable Information (PII) it does not need. "Collection" is a technology neutral concept. If a named data record comes to be in your possession, you have essentially collected it. Collection can be direct or indirect. So, putting names to erstwhile anonymous data -- be it photos or DNA -- is a clear case of indirect collection of PII.

    Re-identificaton of DNA is an act that has major implications under existing international privacy law. There is an argument in my mind that any re-identification by researchers should at the very least be subject to ethics committee approval. And any company that deliberately exploits DNA re-identification may face the force of the law as Facebook did.

    See http://lockstep.com.au/blog/2013/02/08/dna-privacy-letter-to-science.

    Reply | Report Abuse | Link to this
Leave this field empty

Add a Comment

You must sign in or register as a ScientificAmerican.com member to submit a comment.
Click one of the buttons below to register using an existing Social Account.

More from Scientific American

See what we're tweeting about

Scientific American Editors

More »

Free Newsletters


Get the best from Scientific American in your inbox

Solve Innovation Challenges

Powered By: Innocentive

  SA Digital

Latest from SA Blog Network

  SA Digital

Email this Article

A Hacked Database Prompts Debate about Genetic Privacy

X
Scientific American Magazine

Subscribe Today

Save 66% off the cover price and get a free gift!

Learn More >>

X

Please Log In

Forgot: Password

X

Account Linking

Welcome, . Do you have an existing ScientificAmerican.com account?

Yes, please link my existing account with for quick, secure access.



Forgot Password?

No, I would like to create a new account with my profile information.

Create Account
X

Report Abuse

Are you sure?

X

Institutional Access

It has been identified that the institution you are trying to access this article from has institutional site license access to Scientific American on nature.com. To access this article in its entirety through site license access, click below.

Site license access
X

Error

X

Share this Article

X