In a yet-to-be peer-reviewed study published on Freedom To Tinker, a site hosted by Princeton's Center for Information Technology Policy, three researchers document how third-party tracking scripts have the capability to scoop up information from Facebook's login API without users knowing. The tracking scripts documented by Steven Englehardt, Gunes Acar, and Arvind Narayanan represent a small slice of the invisible tracking ecosystem that follows users around the web largely without their knowledge.
Most of the scripts the researchers examined grab a user ID that is unique to that website, as well as the person's name and email. But the problem is, using Facebook's API, you could easily link that unique ID to someone's Facebook profile. For example, a tracker might have registered that Visitor 1 went to a webpage, but with Facebook Login, they could connect that person to their public social media profile. That information can be used to track users across other websites and devices.
The researchers found that sometimes when users grant permission for a website to access their Facebook profile, third-party trackers embedded on the site are getting that data, too. That can include a user's name, email address, age, birthday, and other information, depending on what info the original site requested to access. The study found that this particular breed of tracking script is present on 434 of the web's top one million websites, though not all of them are querying Facebook data from the API—the researchers only confirmed that such a script was present.
After Princeton published their research, Facebook said it would suspend this ability.
“Scraping Facebook user data is in direct violation of our policies. While we are investigating this issue, we have taken immediate action by suspending the ability to link unique user IDs for specific applications to individual Facebook profile pages, and are working to institute additional authentication and rate limiting for Facebook Login profi picture requests," a Facebook spokesperson said in a statement.
The Princeton researchers identified seven different scripts that are capable of pulling information from Facebook's login API, one of which they couldn't link to a specific company. The remaining scripts are created by six marketing and fraud prevention companies: Lytics, ProPS, Tealium, Forter, and OnAudience, the last of which stopped collecting information from Facebook's login API following the publication of another third-party tracking study conducted by one of the same researchers in December. In a statement, OnAudience stressed that the platform that had this capability, behavioralengine.com, no longer exists, and its current platform uses different technology for collecting data. ProPS did not immediately return a request for comment.
Adam Corey, the CMO of Tealium, as well as James McDermott, the CEO of Lytics, explain that the Princeton researchers' findings are not as simple as they may appear, in part because the internet's tracking ecosystem is so complicated. First, it's important to understand what these companies, and other like them, actually do. They create software and tracking tools that websites can use to find out information about their customers, which sites pay for. In other words, a site might buy a tracking product from one of these companies, and then use it to suck information out from Facebook's API. But that capability is not usually what a company intended for their tools to be used for.
McDermott says that while it's possible to change the code of Lytics' tracking tools to collect information from Facebook's API, it's something he would discourage his clients from doing, and not a behavior his company has seen. "In no case have we seen that deployment," he says. He stresses that his company also doesn't control whether a client installs Facebook's login API and isn't able to obtain or look at any data that their clients collect.
Corey said much the same thing. "No, we do not query Facebook's APIs to pull any user information," he said in a statement. "Tealium is in the business of helping companies manage the data that flows between their users and their digital properties. We have no interest, nor do we think it's acceptable for a company to piggyback or use this type of data for any other purpose." Both Corey and McDermott said they do not sell the data their clients collect.
Still, the Princeton study underscores the risks associated with users bringing their Facebook data to other parts of the web, where they might not understand how it is being collected and parsed. For example, Cambridge Analytica was able to obtain information belonging to up to 87 million people because users shared their Facebook data with a personality test app.
If you want to avoid your data from being collected in the way the study describes, be wary of using Facebook's universal login feature on sites you might not visit often, or ones where the functionality doesn't necessarily make using a site more convenient. Installing a trusted ad blocker, like Adblock Plus, also will help avoid many tracking scripts from accessing any information about you, though the Princeton researchers haven't tested whether it blocks the specific scripts associated with this vulnerability.
In 2014, Facebook said it was creating Anonymous Login, "a way to log into apps without sharing any personal information from Facebook." It never followed through with the plan.