During the January 6 assault on the Capitol Building in Washington, D.C., rioters posted photographs and videos of their rampage on social media. The platforms they used ranged from mainstream sites such as Facebook to niche ones such as Parler—a social networking service popular with right-wing groups. Once they realized this documentation could get them in trouble, many started deleting their posts. But Internet sleuths had already begun downloading the potentially incriminating material. One researcher, who publicly identifies herself only by the Twitter handle @donk_enby, led an effort that she claims downloaded and archived more than 99 percent of all data posted to Parler before Amazon Web Services stopped hosting the platform. Scientific American repeatedly e-mailed Parler’s media team for comment but had not received a response at the time of publication.

Amateur and federal investigators can extract a lot of information from this massive trove, including the locations and identities of Parler users. Although many of those studying the Parler data are law enforcement officials looking into the Capitol insurrection, the situation provides a vivid example of the way social media posts—whether extreme or innocuous—can inadvertently reveal much more information than intended. And vulnerabilities that are legitimately used by investigators can be just as easily exploited by bad actors.

To learn more about this issue, Scientific American spoke with Rachel Tobac, an ethical hacker and CEO of SocialProof Security, an organization that helps companies spot potential vulnerabilities to cyberattacks. “The people that most people are talking about when they think of a hacker, those are criminals,” she says. “In the hacker community, we’re trying to help people understand that hackers are helpers. We’re the people who are trying to keep you safe.” To that end, Tobac also explained how even tame posts on mainstream social media sites could reveal more personal information than many users expect—and how they can protect themselves.

[An edited transcript of the interview follows.]

How was it possible to download so much data from Parler?

Folks were able to download and archive the majority of Parler’s content ... through automated site scraping. [Parler] ordered their posts by number in the URL itself, so anyone with any programming knowledge could just download all of the public content. This is a fundamental security vulnerability. We call this an insecure direct object reference, or IDOR: the Parler posts were listed one after another, so if you just add “1” to the [number in the] URL, you could then scrape the next post, and so on. This specific type of vulnerability would not be found in mainstream social media sites such as Facebook or Twitter. For instance, Twitter randomizes the URLs of posts and requires authentication to even work with those randomized URLs. This [IDOR vulnerability]—coupled with a lack of authentication required to look at each post and a lack of rate limiting (rate limiting basically means the number of requests that you can make to pull data)—means that even an easy program could allow a person to scrape every post, every photo, every video, all the metadata on the Web site.

What makes the archived data so revealing?

The images and videos still contained GPS metadata when they went online, which means that anyone can now map the detailed GPS locations of all the users who posted. This is because our smartphone logs the GPS coordinates and other data, such as the lens and the timing of the photo and video. We call this EXIF data—we can turn this off on our phones, but many people just don’t know to turn that off. And so they leave it embedded within the files that they upload, such as a video or a photo, and they unknowingly disclose information about their location. Folks on the Internet, law enforcement, the FBI can use this information to determine where those specific users live, work, spend time—or where they were when they posted that content.

Can investigators extract similar information from posts on more mainstream platforms?

This EXIF data are scrubbed on places such as Facebook and Twitter, but we still have a lot of people who don’t realize how much they’re compromising their location and information about themselves when they’re posting. Even if Parler did scrub the EXIF data, we saw on a lot of posts during this event that people were geolocation tagging their Instagram Stories to the Capitol Building that day or broadcasting their actions on Facebook Live publicly and tagging where they were located. I think it’s a general lack of understanding or maybe not realizing just how much data they’re leaking. And I think plenty of folks also didn’t realize that maybe they wouldn’t want to geolocation tag during that event.

Under more normal circumstances, is there a problem with geolocation tagging?

Many people think, “Well, I’m not doing anything wrong, so why would I care if I post a photo?” But let’s just take a really innocuous example, such as going on vacation. [If] you geolocation tag the hotel, what could I do as an attacker? Well, the obvious thing is: you’re not home. But I feel like most people get that. What they don’t probably get is that I can social engineer: I can gain access to information about you through human systems at that hotel. I could call up your hotel pretending to be you and gain information about your travel plans. I could steal your hotel points. I could change your room. I could do all this nefarious stuff. We can do so much and really manipulate because our service providers don’t authenticate the way that I would recommend that they authenticate over the phone. Can you imagine if you could log into your Gmail account, your calendar or something like that by just using your current address, your last name and your phone number? But that’s how it works with a lot of these different companies. They don’t use the same authentication protocols that they would use, say, on a Web site.

How can people protect themselves?

I don’t think it would be fair to tell people that they couldn’t post. I post on Twitter multiple times a day! Instead of saying, “You can’t do this,” I would recommend being what I call “politely paranoid” about what we post online. For instance, we can post about the vacation, but we don’t want location- or service-provider-identifying markers within the post. So how about you post a picture of the sunset and the margarita but don’t geolocation tag the hotel? These very small changes can help folks protect their privacy and safety in the long run while still getting everything that they want out of social media. If you really want a geolocation tag, you can save the city that you’re in rather than the hotel: [then] I can’t call up the city and try and get access to your hotel points or change your plans.

Should social media sites just prevent geolocation tagging? What responsibilities do platforms have to protect their users?

I think it’s really important that all platforms, including social media platforms, follow best practices regarding security and privacy to keep their users safe. It’s also a best practice to scrub metadata for your users before they post their photos or videos so they don’t unknowingly compromise themselves. All of that is the platform’s responsibility; we have to hold them to that [and] make sure that they do those things. After that, I would say individuals get to choose how much risk they would like to take. I work hard to ensure nonsecurity folks understand risks: things such as geolocation tagging, [mentioning] service providers [and] taking pictures of their license, credit cards, gift cards, passports, airplane tickets—now we’re seeing COVID-19 vaccination cards with sensitive data on them. I don’t think it’s the social media company’s responsibility, for instance, to dictate what somebody can or cannot post when it comes to their travel photos. I think that’s up to the user to decide how they would like to use that platform. And I think it’s up to us as [information security] professionals to clearly communicate what those risks are so people can make an informed decision.