Companies and individuals are often at odds, concerned either with collecting information or with preserving privacy. Online stores and services are always eager to know more about their customers—income, age, tastes—whereas most of us are not eager to reveal much.
Math suggests a way out of this bind. A few years ago Rakesh Agrawal and Ramakrishnan Srikant, both data-mining researchers, developed an idea that makes telling the truth less worrisome. The idea works if companies are content with accurate aggregate data and not details about individuals. Here is how it goes: you provide the numerical answer to certain intrusive online questions, but a random number is added to (or subtracted from) it, and only the sum (or difference) is submitted to the company. The statistics needed to recover approximate averages from the submitted numbers is not that difficult, and your privacy is preserved.
Thus, say you are 39 and are asked your age. The number sent to the site might be anywhere in the range of 19 to 59, depending on a random number between –20 and +20 that is generated (by the company if you trust it, by an independent site or by you). Similar fudge factors would apply to incomes, zip codes, years of schooling, size of family, and so on, with appropriate ranges for the generated random number.
Another, older example from probability theory illustrates a variant of the idea. Imagine you are on an organization’s Web site, and the organization wishes to find out how many of its subscribers have ever X-ed, with X being something embarrassing or illegal. Not surprisingly, many people will lie if they answer the question at all. Once again, random masking comes to the rescue. The site asks the question, “Have you ever X-ed? Yes or no,” but requests that before answering it, you privately flip a coin. If the coin lands heads, the site requests that you simply answer yes. If the coin lands tails, you are instructed to answer truthfully. Because a yes response might indicate only a coin’s landing heads, people presumably would have little reason to lie.
The math needed to recover an approximation of the percentage of respondents who have X-ed is easy. To illustrate: if 545 of 1,000 responses are yes, we would know that about 500 of these yesses were the result of the coin’s landing heads because roughly half of all coin flips would, by chance, be heads. Of the other approximately 500 people whose coin landed tails, about 45 of them also answered yes. We conclude that because 45 or so of the approximately 500 who answered truthfully have X-ed, the percentage of X-ers is about 45/500, or 9 percent.
In some situations, variants of this low-tech technique, in conjunction with appropriate legislation, would work—or so thinks this 6′9″ X-er.