I. Imperfect Information
Focusing on facts is generally an effective first step to gaining clarity about a complex or elusive topic. In the case of privacy, the facts are denied to us. Those who have reduced our privacy, whether they are state or commercial actors, prefer that we do not reduce theirs. The National Security Agency (nsa), for example, long hid the full extent of its vast electronic surveillance operations. Even after the recent leaks by former NSA contractor Edward J. Snowden, we can know only approximately what is going on.
No single observer has a complete picture of who has gathered what data about whom in our world today. Certain organizations, such as the NSA, know immensely more than anyone else, but not even they know the full range of algorithms that commercial and government entities have applied to personal data or to what effect.
Therefore, privacy is, for now, a murky topic that can be investigated only in a prescientific manner. We must rely more than we might prefer to on theory, philosophy, introspection and anecdotes. But this does not mean that we cannot think.
II. What is Privacy?
A philosophy of privacy is a cultural badge. Growing up in New Mexico, I lived with some native Pueblo people one summer. They complained that anthropologists had done more damage to their culture than missionaries because it was the anthropologists who published their secrets. And yet the elderly couple who told me this had a son who had become an anthropologist. Meanwhile Chinese students in the U.S. used to barge into rooms without knocking, unable to comprehend why this wasn't acceptable. That has changed, as has China.
These days the young and geeky are sometimes said to care less about privacy than their elders. An older person who grew up in a world without wearable computing is more likely to feel violated when confronted with a face-mounted camera. Companies such as Facebook have been criticized—or praised—for socializing young people to be comfortable with the activities of the nSA and other intelligence agencies. The group that promotes privacy most intensely and from the grass roots may be gun owners, who fear that being placed on a government list might eventually lead to confiscation of their firearms.
Despite the variety of attitudes toward privacy, talk of policy matters usually leads to a discussion of trade-offs. If the state must be able to analyze everyone's personal information to catch terrorists before they act, then individuals cannot be entitled to both privacy and safety. Or at least this is the way the trade-off is often framed.
Something is askew in thinking about privacy this way. Considered in terms of trade-offs, privacy inevitably ends up being framed as a culturally sanctioned fetish—an adult security blanket. How much privacy are people “willing to give up” for certain benefits? Implicit in this formulation is the notion that any desire for privacy might be an anachronism, like the blind spot in the human retina. This is akin to asking how ill tasting a medicine a patient is willing to drink to cure a serious disease. The implication is that the patient ought to stop being so delicate. A kindred claim holds that if people “would only share more,” they could enjoy more convenience or create more value in online networks.
It is tempting to dismiss subjective feelings about privacy because they are fickle, but that might be a mistake. What if there is value in different people or cultures maintaining different practices around privacy? Cultural diversity, after all, should be treated as an intrinsic good. To think otherwise is to assume that culture, thought and information habits are already as good as they could possibly be—that only one stance regarding privacy, whatever it may be, is the right one. An ecologist would never think that evolution had reached its conclusion. Perhaps, then, not everyone should be herded into a single ethic of information. Perhaps people should be free to choose among varying degrees of privacy.
III. Privacy as Power
In the information age, privacy has come to mean, nakedly, information available to some but unavailable to others. Privacy is the arbiter of who gets to be more in control.
Information has always been an important tool in contests for wealth and power, but in the information age it is the most important tool. Information supremacy becomes harder to distinguish from money, political clout or any other measure of power. The biggest financial schemes are also the most computational; witness the rise of high-frequency trading. Big computation has not just benefited occasional firms but has had a macroeconomic effect because it has amplified the scale of the financial sector so impressively. Companies such as Google and Facebook sell nothing but computation designed to improve the efficacy of what we still call “advertising,” although that term has less and less to do with persuasion through rhetoric or style. It has instead come to mean directly tweaking what information people are exposed to conveniently. Similarly, modern elections rely on large-scale computation to find persuadable voters and motivate them to turn out. Privacy is at the heart of the balance of power between the individual and the state and between business or political interests.
This state of affairs means that unless individuals can protect their own privacy, they lose power. Privacy has become an essential personal chore that most people are not trained to perform. Those in the know do a better job of staying safe in the information age (by discouraging identity theft, for instance). Therefore, society has taken on a bias in favor of a certain kind of technically inclined person—not just in the job market but in personal life.
Some cyberactivists argue that we should eliminate secrets entirely. But young techies who declare that sharing is wonderful are often obsessive about blocking the spybots that infest most Web sites or using encryption to communicate electronically. In this, the young techies and the biggest tech companies are similar. Facebook and its competitors promote openness and transparency to their users but hide predictive models of those users in deep, dark basements.
IV. The Zombie Menace
We are cursed with an unusually good-natured technical elite. The mostly young people who run the giant cloud computing companies that provide modern services such as social networking or Web searching, as well as their counterparts in the intelligence world, are for the most part well intentioned. To imagine how things could go bad, we have to imagine these charming techies turning into bitter elders or yielding their empires to future generations of entitled, clueless heirs. It should not be hard to fathom, because such scenarios have happened as a rule in human history. It feels heartless to think that way when you know some of the nice sorts of techies who thrive in our computation-centric times. But we have to do our best at thinking dark thoughts if we are to have any forethought about technology at all.
If an observer with a suitably massive computer obtained enough personal information about someone, that observer could hypothetically predict and manipulate that person's thoughts and actions. If today's connected devices might not be up to the task, tomorrow's will be. So suppose some future generation of hyperconvenient consumer electronics takes the form of a patch on the back of the neck that directly taps into the brain to know, prior to self-awareness, that one is about to ponder which nearby café to visit. (Bringing relief to this darkest of dilemmas has become the normative challenge for consumer technology in our times.)
Many of the components to create such a service exist already. At laboratories such as neuroscientist Jack Gallant's at the University of California, Berkeley, it is already possible to infer what someone is seeing, or even imagining, or about to say, merely by performing “big data” statistics correlating present functional magnetic resonance imaging measurements of the brain with the circumstances of previous measurements. Mind reading, of a sort, has therefore already been accomplished, based on statistics alone.
Now let us suppose that while wearing this hyperconvenient device, you are about to decide to go to a café, only you do not know it yet. And let us suppose that some entity—some Facebook or NSA of the future—has access to that device and an interest in steering you away from café A and toward café B. Just as you are about to contemplate café A, a nagging message from your boss pokes up in your head-up display; you become distracted and frustrated, and the thought of going to café A never actually comes to mind. Meanwhile a thought about café B releases a tweet from some supposed hot prospect on a dating site. Your mood brightens; café B suddenly seems like a great idea. You have become subject to neo-Pavlovian manipulation that takes place completely in a preconscious zone.
The point of this thought experiment, which has a long pedigree in science fiction, is that computing and statistics could effectively simulate mind control. It is arguable that a regime of cloud-driven recommendation engines in ever more intimate portable devices could get us part of the way in the next few years to the mind-control scenario just described.
V. Plague of Incompetence
The traditional, entertaining way to tell a cautionary science-fiction tale is to conjure an evil villain who becomes all-powerful. Instead of considering that potential dark future, I will focus on a scenario that is not only more likely but that has already manifested in early forms. It is less an evil scheme orchestrated by hypercompetent villains and more like a vague plague of incompetence.
In such a scenario, an entity or, say, an industry would devote tremendous resources to the algorithmic manipulation of the masses in pursuit of profit. The pursuit would indeed be profitable at first, although it would eventually become absurd. This has already happened! Look no further than the massive statistical calculations that allowed American health insurance companies to avoid insuring high-risk customers, which was a profitable strategy in the near term—until there came to be an unsustainable number of uninsured people. Society could not absorb the scheme's success. Algorithmic privacy destruction as a means to wealth and power always seems to end in a similar massive bungle.
Consider the state of modern finance. Financial schemes relying on massive statistical calculations are often successful at first. With enough data and computation, it is possible to extrapolate the future of a security, the behavior of a person or really any smoothly varying phenomenon in the world—for a time. But big data schemes eventually fail, for the simple reason that statistics in isolation only ever represent a fragmentary mirror of reality.
Big data finance was not based on encroaching on individual privacy (by, for example, modeling individuals and targeting them with stupid mortgages and credit offers) until the beginning of the 21st century. Prior to that, it was more abstract. Securities were modeled, and investments in them were managed automatically, absent any understanding of what was actually being done in the real world as a result. Greenwich, Conn.–based hedge fund Long-Term Capital Management was an early example. It was a spectacular high flier until it failed in 1998, requiring a stupendous bailout from taxpayers. (High-frequency trading schemes are now reinitiating the pattern with bigger data and faster computation.) Now, however, much of the world of highly automated finance relies on the same massive individual privacy evaporation that is characteristic of spy craft or the consumer Internet. The mortgage-backed securities that led to the Great Recession finally joined personal-privacy violation to automated trading schemes. Another cosmic-scale bailout at the public's expense occurred, and similar future bailouts will follow, no doubt.
This is not a story of an ultracompetent elite taking over the world. Instead it is a story of everyone, including the most successful operators of giant cloud services, having trouble understanding what is going on. Violating everyone else's privacy works at first, creating fortunes out of computation, but then it fails. This pattern has already created financial crises. In the future, when whoever runs the most effective computers with the most personal data might be able to achieve a greater degree of prediction and manipulation of the whole population than anyone else in society, the consequences could be much darker.
VI. The True Measure of Big Data
When somebody is selling the abilities of a service that gathers and analyzes information about vast numbers of other people, they tend to adopt a silly, extreme braggadocio. To paraphrase the sort of pitch I have heard many times, “Someday soon, if not already, giant computers will be able to predict and target consumers so well that business will become as easy as turning a switch. Our big computer will attract money like iron filings to a magnet.”
For instance, I have been present when a Silicon Valley start-up, hoping to be acquired by one of the big players, claimed to be able to track a woman's menstrual cycle by analyzing which links she clicked on. The company said it could then use that information to sell fashion and cosmetics products to her during special windows of time when she would be more vulnerable to pitches. This scheme might be valid to a point, but because it relies purely on statistics, with no supporting scientific theory, it is impossible to know what that point is.
Similarly, when selling a system that gathers information about citizens, a government agency—or more likely, a private contractor serving an agency— might make colorful claims about catching criminals or terrorists before they strike by observing and analyzing the entire world. The terminology of such programs (“Total Information Awareness,” for instance) reveals a desire for a God-like, all-seeing perch.
Science fiction has contemplated this kind of thing for decades. One example is the “precrime” unit in Minority Report, a movie, based on a 1956 short story by Philip K. Dick, that I helped to brainstorm many years ago. The precrime unit caught criminals before they had the chance to act. But let us be clear: this is not what giant systems for data gathering and analysis actually do.
The creators of such systems hope that one day metadata will support a megaversion of the kind of “autocomplete” algorithms that guess what we intend to type on our smartphones. Statistical algorithms will fill holes in the data. With the aid of such algorithms, studying the metadata of a criminal organization ought to lead us to new, previously unknown key members.
But thus far, at least, there appears to be no evidence that metadata mining has prevented a terrorist act. In all the cases we know about, specific human intelligence motivated direct investigations that led to suspects. In fact, when responsible officials from the various giant cloud computer projects, whether private or governmental, describe what they do, the claims come rapidly down to earth, especially under careful reading. Yes, once there are leads about a potential terrorist plot, it is faster to connect the dots with a giant database readily at hand. But the database does not find the leads in the first place.
One often sees a parlor trick these days: an after-the-fact analysis of historical events that purports to show that big data would have detected key individuals in plots before they occurred. An example is that algorithmic analysis of Paul Revere's contemporaries reveals that Revere was a central connecting figure in a social network. The datum in this case is his membership in various organizations before the American Revolutionary War. Seoul National University sociologist Shin-Kap Han demonstrated that analysis of a rather small database of memberships in varied prerevolutionary organizations singles out Revere as a unique connecting figure. More recently, Duke University sociologist Kieran Healy independently derived similar results from a slightly divergent database representing the same events.
Sure enough, there is Paul Revere, placed right in the middle of the clusters connecting other individuals. Such results advertise the application of metadata to security. Still, there are several factors to consider before being persuaded that this type of research can predict events before they happen.
Revere was clearly in a special position to be a linchpin for something. Lacking any historical context, however, we would not know what that thing might be. A similar centrality might accrue to the individual who was able to procure the best ales. Metadata can only be meaningful if it is contextualized by additional sources of information. Statistics and graph analyses cannot substitute for understanding, although they always seem to for a little while.
The danger is that big data statistics can create an illusion of an automatic security-generating machine, similar to the illusion of guaranteed wealth machines that Wall Street is always chasing. A stupendous amount of information about our private lives is being stored, analyzed and acted on in advance of a demonstrated valid use for it.
VII. Software is Law
One frequently hears statements of this sort: “The Internet and the many new devices communicating through it will make personal privacy obsolete.” But that is not necessarily so. Information technology is engineered, not discovered.
It is true that once a network architecture is established, with many users and practically uncountable interconnecting computers relying on it, changes can be difficult to achieve. The architecture becomes “locked in.” The nature of privacy in our digital networks, however, is not yet fully locked in. We still have the potential to choose what we want. When we speak about grand trade-offs between privacy and security or privacy and convenience, it is as if these trade-offs are unavoidable. It is as if we have forgotten the most basic fact about computers: they are programmable.
Because software is the way people connect and get things done, then what the software allows is what is allowed, and what the software cannot do cannot be done. This is particularly true for governments. For instance, as part of the Affordable Care Act, or Obamacare, smokers in some states will in theory pay a higher price for health insurance than nonsmokers. The reason it is only “in theory” is that the software that will run the new legal framework for health care finance in the U.S. was not written to accommodate the penalty for smokers. So the law will have to go into effect without the penalty, awaiting some moment in the future when the software is rewritten. Whatever anyone thinks about the law, it is the software that determines what actually happens.
The example of the penalty for smokers just hints at a larger issue. Quirks in the software that implements Obamacare or any other society-scale project could determine more about the experiences of individuals in a society than the intent of politicians.
VIII. How to Engineer the Future When We Don't Know What We're Doing
There are two primary schools of thought for how to get value from big data without creating too much collateral damage in the form of privacy violation. One seeks to articulate and enforce new regulations. The other seeks to foster universal transparency so that everyone will have access to all data and no one will gain an undue advantage. These two efforts are for the most part tugging in opposite directions.
The problem with privacy regulations is that they are unlikely to be followed. Big data statistics become an addiction, and privacy regulations are like drug or alcohol prohibitions. One disheartening aspect of the periodic leaks related to the NSA is that even secret rules and regulations embraced by the organization seemed to be futile. NSA employees used their perches to spy on romantic interests, for instance. Nevertheless, perhaps some new regulations and oversight could do some good.
But what of the opposite idea—making data openness more common? The problem with that approach is that it is not just access to data that matters. More important is the computing power used to analyze those data. There will always be someone with the most effective computer, and that party is unlikely to be you. Openness in the abstract only reinforces the problem because it heightens the incentive to have the biggest computer.
Let us take the ideal of openness to the logical extreme. Suppose the NSA published the passwords to all its internal servers and accounts tomorrow. Anyone could go take a look. Google and its competitors would immediately scrape, index and analyze the vast data stored by the NSA better than you could, and they would be happy to earn fortunes from customers who would leverage that work to find some way to manipulate the world to their advantage instead of to yours. Remember, big data in the raw does not bring power. What brings power is big data plus the very most effective computers, which are generally the giant ones you do not own.
Is there a third alternative? It is almost universally received wisdom that information should be free, in the commercial sense. One should not have to pay for it. This is what has allowed the giant Silicon Valley online companies to rise up so quickly, for instance.
It is worth reconsidering this orthodoxy. Allowing information to have commercial value might clarify our situation while bringing an element of individuality, diversity and subtlety back to questions of privacy.
If individuals were paid when information derived from their existence was used, that might cancel out the motivations to create grand big data schemes that are doomed to fail. A data scheme would have to earn money by adding value rather than using information owned by individuals against them.
This is a subtle concept, and I have been exploring it in detail in a collaboration with Palo Alto Research Center and Santa Fe Institute economist W. Brian Arthur and Eric Huang, a Stanford University graduate student. Huang has extended the most accepted models of insurance businesses to see what happens when information takes on a price. While the results are complex, an overall pattern is that when insurance companies have to pay people for their information they cannot cherry-pick as easily, so they will cover people they would otherwise exclude.
It is important to emphasize that we are not talking about redistributing benefits from the big guys to the little guys; instead this is a win-win outcome in which everyone does better because of economic stability and growth. Furthermore, it is inconceivable to have enough government inspectors to confirm that privacy regulations are being followed, but the same army of private accountants that make markets viable today could probably handle it.
If information is treated as something that has commercial value, then principles of commercial equity might resolve otherwise imponderable dilemmas related to privacy. In our current world, it is very hard to create an in-between level of privacy for oneself without significant technical skills. A nontechnical person must either join a social network or not and can find it difficult to manage privacy settings. In a world of paid information, however, a person might tweak the price of her information up or down and thereby find a suitable shade of gray. All it would take is the adjustment of a single number, a price.
Someone wants to take a picture of you with a face-mounted camera? In the abstract, they could, but to actually look at the picture, to do anything with it, might cost a prohibitive amount. Individuals might miss out on some benefits by setting the price of their information too high, but this is one way cultural diversity can come about even when there are sensors connected to big computers everywhere.
There is also a political angle: when information is free, then the government becomes infinitely financed as a spy on the people because the people no longer have the power of the purse as a means to set the scope of government. Put a price on information, and the people can decide how much spying the government can afford simply by setting the tax rate.
This briefest presentation can only hint at the idea of paid information, and many questions would remain even if I went on for many more pages, but the same can be said for the alternatives. No approach to the quandary of privacy in the big data age, neither radical openness nor new regulation, is mature as yet.
It is immensely worth looking for opportunities to test all the ideas on the table. Network engineers should also build in any software “hooks” we can, whether they will ever be used or not, so that network software will be able to support future ideas about paid information, increased regulation or universal openness. We must not rule anything out if we can possibly help it.
We who build big data systems and devices that connect to them face a tricky situation that will only become more common as technology advances. We have very good reasons to do what we do. Big data can make our world healthier, more efficient and sustainable. We must not stop. But at the same time we must know that we do not know enough to get it right the first time.
We must learn to act as though our work is always a first draft and always do our best to lay the groundwork for it to be reconsidered, even to be radically redone.