Cryptographers and Geneticists Unite to Analyze Genomes They Can’t See

Computer-security methods could help scientists identify disease-causing genes—while preserving patient privacy

A cryptographer and a geneticist walk into a seminar room. An hour later, after a talk by the cryptographer, the geneticist approaches him with a napkin covered in scrawls. The cryptographer furrows his brow, then nods. Nearly two years later, they reveal the product of their combined prowess: an algorithm that finds harmful mutations without actually seeing anyone’s genes.

The goal of the scientists, Stanford University cryptographer Dan Boneh and geneticist Gill Bejerano, along with their students, is to protect the privacy of patients who have shared their genetic data. Rapid and affordable genome sequencing has launched a revolution in personalized medicine, allowing doctors to zero in on the causes of a disease and propose tailor-made solutions. The challenge is that such comparisons typically rely on inspecting the genes of many different patients—including patients from unrelated institutions and studies. The simplest means to do this is for the caregiver or scientist to obtain patient consent, then post every letter of every gene in an anonymized database. The data is usually protected by licensing agreements and restricted registration, but ultimately the only thing keeping it from being shared, de-anonymized or misused is the good behavior of users. Ideally, it should be not just illegal but impossible for a researcher—say, one who is hacked or who joins an insurance company—to leak the data.

When patients share their genomes, researchers managing the databases face a tough choice. If the whole genome is made available to the community, the patient risks future discrimination. For example, Stephen Kingsmore, CEO of Rady Children's Institute for Genomic Medicine, encounters many parents in the military who refuse to compare their genomes with those of their sick children, fearing they will be discharged if the military learns of harmful mutations. On the other hand, if the scientists share only summaries or limited segments of the genome, other researchers may struggle to discover critical patterns in a disease’s genetics or to pinpoint the genetic causes of individual patients’ health problems.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

Boneh and Bejerano promise the best of both worlds using a cryptographic concept called secure multiparty computation (SMC). This is, in effect, an approach to the “millionaires’ problem”—a hypothetical situation in which two individuals want to determine who is richest without revealing their net worth. SMC techniques work beautifully for such conjectural examples, but with the exception of one Danish sugar beet auction, they have almost never been put into practice. The Stanford group’s work, published last week in Science, is among the first to apply this mind-bending technology to genomics. The new algorithm lets patients or hospitals keep genomic data private while still joining forces with faraway researchers and clinicians to find disease-linked mutations—or at least that is the hope. For widespread adoption, the new method will need to overcome the same pragmatic barriers that often leave cryptographic innovations gathering dust.

Answers Hidden and Sought

Intuitively, Boneh and Bejerano’s plan seems preposterous. If someone can see they can leak it. And how could they infer anything from a genome they can’t see? But cryptographers have been grappling with just such problems for years. “Cryptography lets you do a lot of things like [SMC]—keep data hidden and still operate on that data,” Boneh says. When Bejerano attended Boneh’s talk on recent developments in cryptography, he realized SMC was a perfect fit for genomic privacy.

The particular SMC technique that the Stanford team wedded to genomics is known as Yao’s protocol. Say, for instance, that Alice and Bob—the ever-present denizens of cryptographers’ imaginations—want to check whether they share a mutation in gene X. Under Yao’s protocol Alice (who knows only her own genome) writes down the answer for every possible combination of her and Bob’s genes. She then encrypts each one twice—analogous to locking it behind two layers of doors—and works with Bob to find the correct answer by strategically arranging a cryptographic garden of forking paths for him to navigate.

She sets up outer “doors” to correspond to the possibilities for her gene. Call them “Alice doors”: If Bob enters door 3, any answers he finds inside will assume that Alice has genetic variant 3. Behind each Alice door, Alice adds a second layer of doors—the “Bob doors”—corresponding to the options for Bob’s gene. Each combination of doors leads to the answer for the corresponding pair of Alice and Bob’s genes. Bob then simply has to get the right pair of “keys” (essentially passwords) to unlock the doors. By scrambling the order of the doors and carefully choosing who gets to see which keys and labels, Alice can ensure that the only answer Bob will be able to unlock is the correct one, although still preventing herself from learning Bob’s gene or vice versa.

Using a digital equivalent of this process, the Stanford team demonstrated three different kinds of privacy-preserving genomic analyses. They searched for the most common mutations in patients with four rare diseases, in all cases finding the known causal gene. They also diagnosed a baby’s illness by comparing his genome with those of his parents. Perhaps the researchers’ biggest triumph was discovering a previously unknown disease gene by having two hospitals search their genome databases for patients with identical mutations. In all cases the patients’ full genomes never left the hands of their care providers.

Proof of Possibility

In addition to patient benefits keeping genomes under wraps would do much to soothe the minds of the custodians of those genome databases, who fear the trust implications of a breach, says Giske Ursin, director of the Cancer Registry of Norway. “We [must] always be slightly more neurotic,” she says. Genomic privacy likewise offers help for “second- and third-degree relatives, [who] share a significant fraction of the genome,” notes Bejerano’s student Karthik Jagadeesh, one of the paper’s first authors. Bejerano further points to the conundrums genomicists face when they spot harmful mutations unrelated to their work. The ethical question of what mutations a genomicist must scan for or discuss with the patient does not arise if most genes stayed concealed.

Bejerano argues the SMC technique makes genomic privacy a practical option. “It’s a policy statement, in some sense. It says, ‘If you want to both keep your genome private and use it for your own good and the good of others, you can. You should just demand that this opportunity is given to you.’”

Other researchers and clinicians, although agreeing the technique is technically sound, worry that it faces an uphill battle on the practical side. Yaniv Erlich, a Columbia University assistant professor of computer science and computational biology, predicts the technology could end up like PGP (“pretty good privacy”) encryption. Despite its technical strengths as a tool for encrypting e-mails, PGP is used by almost no one—largely because cryptography is typically so hard to use. And usability is of particular concern to medical practitioners: Several echo Erlich’s sentiment that their priority is diagnosing and treating a condition as quickly as possible, making any friction in the process intolerable. “It’s great to have it as a tool in the toolbox,” Erlich says, “but my sense…is that the field is not going in this direction.”

Kingsmore, Erlich and others are also skeptical that the paper’s approach would solve some of the real-world problems that concern the research and clinical communities. For example, they feel it would be hard to apply it directly to oncology, where genomes are useful primarily in conjunction with detailed medical and symptomatic records.

Still, Kingsmore and Erlich do see some potential for replacing today’s clunky data-management mechanisms with more widespread genome sharing. In any case, the takeaway for Bejerano is not that genome hiding is destined to happen, but that it is a technological possibility. “You would think we have no choice: If we want to use the data, it must be revealed.” Now that we know that is not true, it is up to society to decide what to do next.

It’s Time to Stand Up for Science

If you enjoyed this article, I’d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.

I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.

If you subscribe to Scientific American, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.

In return, you get essential news, captivating podcasts, brilliant infographics, can't-miss newsletters, must-watch videos, challenging games, and the science world's best writing and reporting. You can even gift someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you’ll support us in that mission.

Thank you,

David M. Ewalt, Editor in Chief, Scientific American