After a decade in the business world, Sweeney returned to college, completing her undergraduate degree at Harvard University. She then earned her master’s and doctorate in computer science at M.I.T., the first African-American woman to do so. “When I came back, I told them I didn’t plan on taking any more crap,” she laughs.
It was on her return to M.I.T. that Sweeney first fell into the orbits of privacy and security. She had won a fellowship at the National Library of Medicine, and to show her appreciation, she volunteered to help several Boston hospitals improve protection of their medical records, a concern that was surfacing as the Internet ballooned in the mid-1990s. Sweeney wrote a program called Scrub System that tapped her expertise in artificial intelligence to ingeniously search patient records, treatment notes and letters between physicians. Standard search-and-replace software had generally found 30 to 60 percent of personal, identifying information. Scrub System “understands” what constitutes a name, address or phone number and eliminates 99 to 100 percent of the revealing data.
The software won accolades from medical associations. “Her research was highly influential,” says Betsy Humphreys, deputy director of the National Library of Medicine. “A lot of people didn’t see how different life was [in the Internet age].... Latanya’s work raised their awareness.” With Scrub System, “I thought I had solved the privacy problem,” Sweeney says sheepishly. But the truth was, “I really didn’t understand a thing about privacy.”
This realization hit home when one day she reviewed the medical history of a young woman. “At age two this girl was sexually molested, at age three she stabbed her sister with scissors, at four her parents got divorced, at five she set fire to her home,” Sweeney recounts. Clearly, “removing the explicit identifiers wasn’t what [privacy] was about.” It was about the bread crumbs of information we leave behind in records strewn all over the Internet—in medical forms, credit applications, resumes and other documents. Nothing specifically identified the girl in the report, but the scraps of information were unique, and Sweeney was pretty sure she could use them to reidentify her—and almost anyone else.
Programs such as Identity Angel have proved Sweeney correct, and she has spent plenty of time finding ways to invade privacy, sometimes getting the jump on the bad guys, sometimes not. She tells the story of a banker indicted in Maryland who cross-referenced information in publicly available hospital discharge records with his own client list to see if any of his clients had cancer. If they did, he called in their loans. In a project using data from the state of Illinois, Sweeney’s lab found a way to reidentify patients with Huntington’s disease even after all information about the patients had been deleted from their records. Huntington’s is caused by the repetition of a short sequence of DNA. The more this sequence repeats, the earlier the age of onset. Sweeney’s lab combined those data with hospital discharge records, which included patients’ ages, to accurately link 90 percent of the Huntington’s patients with DNA records on file. Abuses may be rare, Sweeney admits, but both cases show how ugly things can get if one database is used to leverage the information in another.
The real solution, Sweeney says, does not lie in her lab or in any other. Ultimately engineers and computer scientists will have to weave privacy protection into the design and usability of their new technologies, up front. If they do, “society can [then] decide how to turn those controls on and off,” Sweeney remarks. Otherwise we might all need to ride a motorcycle to get a few private moments.
This article was originally published with the title A Little Privacy, Please.