Zack has decided to try out the online dating service Chix-n-Studz.com. He signs up for an account at the Web site and fills in several screens of forms detailing his personal profile and what he is looking for in a potential partner. In no time at all, the service offers him a number of possible soul mates, among them the very exciting-sounding Wendy. He sends her his e-mail address and what he hopes is an engaging opening message. She replies directly to him, and a whirlwind e-romance begins.
Poor Zack. Soon he is also getting numerous unsolicited phone calls from political action groups and salespeople who seem to know things about him, and his health insurance company is questioning him about his extreme-adventure vacations; the unscrupulous owners of Chix-n-Studz have been selling client information. Then there is Ivan, a mischievous co-worker to whom Zack foolishly showed one of Wendy’s e-mails. Zack does not know that several subsequent recent messages supposedly from Wendy are fakes from Ivan.
Alice, in contrast, is on cloud nine, as is her new friend Bob. The two have met through SophistiCats.com, a matchmaking service that offers all the latest cryptographic tools. Alice logs on to its Web site protected by anonymous authorization, a system that ensures no one at the service can track who she is or when she is accessing the site. SophistiCats employs software that provides “secure function evaluation” to match her profile and partner criteria with Bob’s, so no one at the service knows their information or even that she and Bob have been matched up. Imagine: a completely effective dating service that knows practically nothing about its clients!
Alice contacted Bob using a feature known as an anonymous channel, and he replied in kind—not even her Internet service provider (ISP) knows that Bob is her contact or what the messages say, and Bob’s ISP is no better informed about her. Alice’s roommate, Eve, however, does know, but only because Alice has talked about Bob and has pinned a printout of some messages above her computer. Eve could be trouble, because she is a die-hard practical joker fully capable of tapping into and altering the data flowing to and from Alice’s computer (in fact, she controls the network that connects them both to the Internet). Never fear: encryption ensures that Eve can learn nothing beyond what Alice has shown her, and the coded “digital signatures” on Alice’s and Bob’s e-mails have made it a cinch for them to spot and ignore Eve’s spoof messages.
Like Alice and Zack, most of us conduct many of our daily personal, business and government transactions electronically. We do so many things online—from staying in touch with friends to buying and selling everything, including the kitchen sink—that getting comprehensive information about most people is as easy as logging, or recording, their online activities. And for various reasons, ISPs are already logging our activities, such as which sites we have visited and when. They are not alone. Many entities we interact with online—stores, newspapers, dating sites, and the like—keep close tabs on us as well. Thus, if we value privacy, we face the challenge of how to take advantage of everything the Internet has to offer without giving up our privacy.
An amazing discovery of modern cryptography is that virtually any task involving electronic communication can be carried out privately. Many people, including the editors of most dictionaries, mistakenly think that “cryptography” is synonymous with the study of encryption. But modern cryptography encompasses much more. It provides mathematical methods for protecting communication and computation against all kinds of malicious behavior—that is, tools for protecting our privacy and security.
Suppose, for instance, that all the members of a group connected by the Internet want to compute something that depends on data from each of them—data that each wants to remain private. The data could be their vote in an election, and they want to know the outcome without revealing their individual votes. A procedure known as multiparty computation or secure function evaluation (SFE) enables them to tally their votes in such a way that each participant learns the correct output and no one can learn anyone’s individual vote—not even a coalition of malevolent insiders capable of intercepting messages on the network and substituting their own carefully crafted fake data. The SFE protocol can also provide each individual with a private output, as done by the fanciful SophistiCats service.
The basic idea behind SFE is that each participant’s inputs are split into pieces, or shares, and distributed among the others in the group. Each participant then operates on the shares under his or her control (adding them, redistributing shares of the result, and so on). Finally, the group brings the pieces together again to get the final output. No one ever has the data needed to reconstruct another person’s inputs.
It may not seem surprising that a function as simple as adding up votes can be evaluated securely, but recall what SophistiCats did for Alice: it worked out which members among its thousands of clients were good matches for her and gave her some limited information about those matches, all without itself learning anything about her profile or anyone else’s. A Big Brother organization eavesdropping on the network traffic or combing through the data on SophistiCats’s hard drives would be similarly incapable of learning anything.
SophistiCats is a fictional service, but cryptography investigators have shown how to turn it into fact. Indeed, this past January, SFE was used for the real-world problem (in Denmark, at least) of setting the price for sugar beet contracts to be traded among some 1,200 Danish farmers, based on bids that they inputted privately. Through SFE we can all have the best of both worlds: the functionality that we want over the Internet without sacrificing privacy.
Although the SFE protocol makes possible a wide range of capabilities, its power and generality come at a price: it takes a large amount of computation and communication. The protocol is efficient enough for special tasks such as elections, yet it is too cumbersome to be pressed into service every time you click on a link to a secure Web page. Instead computer scientists have developed specialized protocols that are much more efficient than SFE for particular common tasks. These include:
Encryption. Neither Alice’s ISP nor Eve can decipher the messages Alice sends to Bob. The traffic between Alice’s computer and SophistiCats is secure as well.
Authentication. Alice can be sure messages come from Bob, not Eve.
Anonymous channels. Alice’s ISP cannot tell to whom she has sent the messages or that she has ever visited the SophistiCats Web site.
Zero-knowledge proof. Alice can prove to someone else that something is true without revealing what her proof is.
Anonymous authorization. SophistiCats knows that she is a member when she accesses its Web site, but it cannot tell who she is. This protocol is a special case of a zero-knowledge proof.
The oldest and one of the most fundamental problems studied in cryptography is that of encryption—the problem of how to communicate securely over an insecure channel (one on which an adversary can eavesdrop). Alice wants to send a message to Bob, but Eve has control over part of the channel (through the apartment’s network) that Alice will use. Alice wants Bob, but not Eve, to be able to read the message.
In analyzing this problem, notice, first, that Bob must know something that Eve does not—otherwise Eve would be able to do whatever Bob can do. Bob’s private knowledge is called his secret key (SK). Second, notice that Alice must know something about Bob’s SK so that she can create a ciphertext—an encrypted message—specifically for Bob. If Alice knows the SK itself, the protocol is called secret-key encryption, the kind of encryption that has been known and practiced for centuries.
In 1976 Whitfield Diffie and Martin E. Hellman, both then at Stanford University, envisioned another possibility, called public-key encryption, in which Alice need not know the SK. All she needs is a public value related to the SK called Bob’s public key (PK). Alice uses his PK to encrypt her message, and only Bob, with his SK, can decrypt the resulting ciphertext. It does not matter that Eve also knows Bob’s PK because she cannot use it to decrypt the ciphertext. Diffie and Hellman proposed the public-key idea but did not know how to carry it out. That came a year later, when Ronald L. Rivest, Adi Shamir and Leonard M. Adleman, all then at the Massachusetts Institute of Technology, gave the first construction of a public-key cryptosystem: the RSA algorithm.
Their algorithm works for public-key encryption because it involves a so-called trapdoor function. Such a function is easy to compute, to produce the ciphertext, yet hard to invert, to recover the plaintext, unless a special “trapdoor” is used. The trapdoor serves as the secret key. The RSA algorithm was the first example of a function with a trapdoor property. For this work they won the 2002 A. M. Turing Award, the most prestigious prize in computer science.
The RSA discovery, hailed as a fundamental cryptographic breakthrough, fueled years of subsequent research in encryption and in cryptography more generally. Much hard work on encryption still remains, from finding new trapdoor functions, to studying the mathematical assumptions that underpin the security of a specific function, to defining precisely what is required for an encryption system to be considered secure.
Public-key encryption makes it possible to purchase things online without sending sensitive information such as credit-card numbers openly on the Internet. The customer’s Web browser plays the role of Alice and the Web site the role of Bob. More generally, https, a protocol that most browsers now support, uses public-key encryption to provide Web browsing over an encrypted channel—look for “https://” in the URL (the address of the Web site) and an icon such as a closed padlock on the browser’s status bar.
Many people also use public-key encryption for secure e-mail. Plenty of free software exists for that purpose, including the GNU Privacy Guard package (available at www.gnupg.org) first released by the Free Software Foundation a decade ago. If you do not encrypt your e-mail, it travels across the Internet in a form that is easy to read and may remain in that form on various hard drives along the way for some time afterward.
Hi, It’s Me!
Closely associated with the problem of encryption is that of authentication. Suppose Alice receives the message “Alice, please send Eve $100. Love, Bob.” How does she know that it really came from her boyfriend Bob and was not in fact fabricated by Eve?
Just as in the encryption scenario, Bob must know something that Eve does not so that he, but not Eve, can produce a message that Alice will accept. Thus, Bob again needs a secret key. Moreover, Alice needs to know something about Bob’s SK to be able to verify that the message is from Bob. Once again, two varieties of protocol exist: secret-key authentication, more commonly known as a message authentication code, and public-key authentication, frequently referred to as a digital-signature scheme. Diffie and Hellman first envisioned digital-signature schemes at the same time that they proposed public-key encryption, and a scheme using the RSA algorithm was the first one constructed.
The chief idea is that Bob uses his SK to compute a “signature” that he appends to his message and that Alice or anyone else then uses his PK to verify that it matches the message itself. Alice knows the message must be from Bob because no one else has the SK needed to produce the valid signature.
Currently it is easy to trick an e-mail client into thinking that a message came from Bob when in fact it came from Eve. A spoofed e-mail may include fake news reports and incorrect stock quotes, tricking people to act against their best interest. But if all e-mail communication were authenticated, such an attack would be impossible: your e-mail client would digitally sign all outgoing messages and would verify the digital signatures of all incoming messages. Authentication could also combat spam by having servers reject incoming e-mail that is not authenticated by its sender. Authentication protocols did not exist when e-mail was developed in the 1970s, and many conventions from that era still prevail.
Software that everyone can use to sign their e-mail and verify signatures is freely available, for instance, as a part of the GNU Privacy Guard package mentioned earlier.
By encrypting your messages, you can prevent ISPs (or any other eavesdropper) from discovering what you send and receive, but not to whom you are communicating. For example, Alice’s ISP will know if she browses an Alcoholics Anonymous Web site. Imagine if the ISP were to sell this information to car insurance companies. People would be less likely to seek help online because they would be worried that it would increase their insurance premium.
This problem could be solved with SFE: Alice’s private input would be the URL she wants to look at, and her private output would be the contents of the Web page she wants to see. Using SFE, however, would be highly inefficient. In 1981 David Chaum, then at the University of California, Berkeley, proposed a much simpler solution called anonymous channels, now also known as onion routing.
As the name suggests, Alice wraps her message in layers. She encrypts each layer (and everything inside it) with a different person’s public key and then adds that person’s address to the outside of the layer. A message from Alice to Bob could travel as follows: Alice sends the onion to Mark, who can peel off the outermost layer by decrypting the onion with his secret key. Inside, Mark finds a smaller onion and Lisa’s address. He forwards that onion to Lisa, who can decrypt it with her key, and so on. Finally, Bob receives the onion core from someone, and he decrypts it to find Alice’s message.
In practice, the intermediaries are part of a network of computers set up to handle the decryption and forwarding automatically. Ideally, each intermediary continually receives lots of onions and forwards them in random order. Even if an ISP is watching all the intermediaries at all times, it cannot tell where Alice’s message went or where Bob’s came from, provided there is enough onion traffic on the network.
Bob himself does not know who sent the message, unless Alice chooses to reveal her identity in the message. Yet even if she remains anonymous to him, he can still send her an anonymous reply if she includes a “reply onion” containing the layers of addresses and public keys needed to route a message back to her.
Alice’s and Bob’s messages can remain untraceable even if some of the intermediaries leak information about what they are doing. As more participants use this system and volunteer their computers to serve as intermediaries, it becomes harder to figure out who is talking to whom.
As with encryption and digital signatures for e-mail, free software is available for anyone to communicate over anonymous channels or to participate as an intermediary. The Onion Router (Tor) project, for instance, can be found at www.torproject.org.
Let’s say Alice has a subscription to the online magazine SophistiCat American. She connects to the magazine via an anonymous channel, logs on with her user name and password, and takes good care that all her incoming and outgoing messages are encrypted. Does that mean she can rest assured that no one will find out what she is doing online? Of course not—the magazine knows exactly what Alice is doing.
Alice might try to cover her tracks by using a pseudonym when she subscribes, but the reading habits of this pseudonymous user may quickly point to Alice’s identity. She may reveal her zip code to look at a weather forecast, type in her birth date to check her horoscope and give away her likely gender by reading about topics such as breast cancer. Those three pieces of information—zip code, date of birth and gender—are enough to uniquely identify 87 percent of the U.S. population [see “Information of the World, Unite!” by Simson L. Garfinkel].
Surprisingly, Alice’s problem has a cryptographic solution called anonymous authorization. Alice can prove to the magazine that she is a valid subscriber each time she accesses its Web page. Yet this proof reveals nothing about which subscriber she is—not even, say, that she is the person who accessed it a few hours earlier. The protocol is a special case of the more general zero-knowledge proof protocol.
With a zero-knowledge proof, Alice can convince Bob that a statement is true without revealing why it is true or, in fact, without revealing any extra information at all. To prove the statement “I am an authorized user of SophistiCat American,” the online magazine or a third-party service would issue a unique credential—something like a secret key—to Alice when she subscribed. Each time the magazine subsequently challenged her, she would use that key to prove she had a valid credential, without revealing the credential itself. With credentials from various authorities, Alice could provide a zero-knowledge proof of more complicated statements such as “I am an authorized user and over 18.”
The basic idea of how a zero-knowledge proof works is illustrated by the scenario described in the box on the opposite page, in which Alice proves to Bob that she has colored a diagram in a special way (technically, that she has “three-colored a graph”) without showing Bob how she colored it. Three-coloring a graph is a so-called NP-complete problem [see “The Limits of Quantum Computers,” by Scott Aaronson; Scientific American, March]. For the present discussion, what is important about “NP-complete” is that you can pick any statement for which you have a reasonably short proof and concoct a version of Alice and Bob’s game to give a zero-knowledge proof of your statement.
The three-colorability protocol demonstrates the principles that make zero-knowledge proofs possible, but it is not very efficient in practice—similar to the way that general secure function evaluation is inefficient. Fortunately, cryptography investigators have developed similar protocols for specific kinds of credentials that can serve for efficient anonymous authorization.
Breaking the Codes
How secure is secure? When Alice encrypts a message to Bob, just how difficult is it for Eve to decipher the message? And what if Eve has some inside knowledge or opportunities to try to game the system? For instance, she may already know something about the encrypted message—say, that it is the name of a local café where Alice and Bob are going to meet in person for the first time. Or if “Bob” is a secure Web server, Eve might send it carefully chosen gibberish in place of ciphertext and, from its responses, learn clues about its secret key. A widely accepted definition of security for public-key encryption covers all those bases and requires that Eve gain not even a little usable information. Among others, the GNU Privacy Guard package passes the test.
Analyzing the security of a cryptosystem is a highly developed science. Contrary to the common perception, cryptography is not a cat-and-mouse game in which a system is presumed to be secure merely because no one has shown how to break it. Instead many building blocks of cryptography rely on well-studied mathematics problems. Cryptographers cannot prove with absolute certainty that such a cryptosystem is unbreakable, but they do prove that any algorithm to break it would also answer a fundamental question that has stymied the best mathematicians and computer scientists.
Some protocols depend only on the existence of a particular kind of mathematical function. For instance, cryptographers know how to construct a public-key cryptosystem out of any trapdoor function. Thus, if someone breaks the functions used in RSA, others that were still standing could be substituted.
Only rarely is a scheme assumed secure on a more ad hoc basis. But that is done only after hundreds of leading researchers around the world have studied the algorithm for several years. The cryptography community can only afford to carry out that process for a few critical building blocks. They then prove the security of larger systems assuming the security of the building blocks. See www.SciAm.com/sep2008 for more on the assumptions behind the security of cryptosystems.
Cryptographic protocols can provide surprisingly versatile solutions to seemingly impossible privacy problems (such as anonymous authorization). But many of the privacy problems we face do not appear cryptographic in nature. If Alice is under constant surveillance in the physical world, it is small consolation that her online activities are secure. In London, cameras already watch public spaces in the interest of law enforcement. Perhaps, to protect privacy, building owners could administer the data from cameras on their property, and SFE could manipulate the data to, say, track suspects leaving a crime scene without storing everyone else’s activities in a central database. More generally, when privacy is threatened by a system such as public surveillance, we should ask ourselves, What problems is the system trying to solve? And can we keep our privacy by using cryptography in solving them?
Note: This story was orignially printed with the title, "How To Keep Secrets Safe".