The fight to study what happens on Facebook

Facebook recently added a new report to its transparency center. The "widely viewed content" report was ostensibly meant to shed light on what’s been a long-running debate: What is the most popular content on Facebook?

The 20-page report raised more questions than answers. For example, it showed that the most viewed URL was a seemingly obscure website associated with former Green Bay Packers players. It boasted nearly 90 million views even though its official Facebook page has just a few thousand followers. The report also included URLs for e-commerce sites that seemed at least somewhat spammy, like online stores for CBD products and Bible-themed t-shirts. There was also a low-res cat GIF and several bland memes that asked people to respond with foods they like or don’t like or items they had recently purchased.

Notably absent from the report were the right-wing figures who regularly dominate the unofficial “Facebook Top 10” Twitter account, which ranks content by engagement. In fact, there wasn’t very much political content at all, a point Facebook has long been eager to prove. For Facebook, its latest attempt at “transparency” was evidence that most users’ feeds aren’t polarizing, disinformation-laced swamps but something much more mundane.

Days later, The New York Times reported that the company had prepped an earlier version of the report, but opted not to publish it. The top URL from thatreport was a story from the Chicago Sun Timesthat suggested the death of a doctor may have been linked to the COVID-19 vaccine. Though the story was from a credible news source, it’s also the kind of story that’s often used to fuel anti-vaccine narratives.

Almost as soon as the initial report was published, researchers raised other issues. Ethan Zuckerman, an associate professor of public policy and communication at University of Massachusetts at Amherst, called it “transparency theatre.” It was, he said, “a chance for FB to tell critics that they’re moving in the direction of transparency without releasing any of the data a researcher would need to answer a question like ‘Is extreme right-wing content disproportionately popular on Facebook?’”

The promise of ‘transparency’

For researchers studying how information travels on Facebook, it’s a familiar tactic: provide enough data to claim “transparency,” but not enough to actually be useful. “The findings of the report are debatable,” says Alice Marwick, principal researcher at the Center for Information Technology and Public Life at University of North Carolina. “The results just didn't hold up, they don't hold up to scrutiny. They don't map to any of the ways that people actually share information.”

Marwick and other researchers have suggested that this may be because Facebook opted to slice its data in an unusual way. They have suggested that Facebook only looked for URLs that were actually in the body of a post, rather than the link previews typically shared. Or perhaps Facebook just has a really bad spam problem. Or maybe it’s a combination of the two. “There's no way for us to independently verify them … because we have no access to data compared to what Facebook has,” Marwick told Engadget.

Those concerns were echoed by Laura Edelson, a researcher at New York University. “No one else can replicate or verify the findings in this report,” she wrote in a tweet. “We just have to trust Facebook.” Notably, Edelson has her own experience running into the limits of Facebook’s push for “transparency.”

The company recently shut down her personal Facebook account, as well as those of several NYU colleagues, in response to their research on political ad targeting on the platform. Since Facebook doesn’t make targeting data available in its ad library, the researchers recruited volunteers to install a browser extension that could scoop up advertising info based on their feeds.

Facebook called it “unauthorized scraping,” saying it ran afoul of their privacy policies. In doing so, it cited its obligation to the FTC, which the agency later said was “misleading.” Outside groups had vetted the project and confirmed it was only gathering data about advertisers, not users’ personal data. Guy Rosen, the company’s VP of Integrity, later said that even though the research was “well-intentioned” it posed too great a privacy risk. Edelson and others said Facebook was trying to silence research that could make the company look bad.“If this episode demonstrates anything it is that Facebook should not have veto power over who is allowed to study them,” she wrote in a statement.

Rosen and other Facebook execs have said that Facebook does want to make more data available to researchers, but that they need to go through the company’s official channels to ensure the data is made available in a “privacy protected” way. The company has a platform called FORT (Facebook Open Research and Transparency), which allows academics to request access to some types of Facebook data, including election ads from 2020. Earlier this year, the company said it would expand the program to make more info available to researchers studying “fringe” groups on the platform.

But while Facebook has billed FORT as yet another step in its efforts to provide “transparency,” those who have used FORT have cited shortcomings. A group of researchers at Princeton hoping to study election ads ultimately pulled the project, citing Facebook’s restrictive terms. They said Facebook pushed a “strictly non-negotiable” agreement that required them to submit their research to Facebook for review prior to publishing. Even more straightforward questions about how they were permitted to analyze the data were left unanswered.

“Our experience dealing with Facebook highlights their long running pattern of misdirection and doublespeak to dodge meaningful scrutiny of their actions,” they wrote in a statement describing their experience.

A Facebook spokesperson said the company only checks for personally identifiable information, and that it’s never rejected a research paper.

“We support hundreds of academic researchers at more than 100 institutions through the Facebook Open Research and Transparency project,” Facebook’s Chaya Nayak, who heads up FORT at Facebook, said in a statement. “Through this effort, we make massive amounts of privacy-protected data available to academics so they can study Facebook’s impact on the world. We also pro-actively seek feedback from the research community about what steps will help them advance research most effectively going forward.”

Data access affects researchers’ ability to study Facebook’s biggest problems. And the pandemic has further highlighted just how significant that work can be. Facebook’s unwillingness to share more data about vaccine misinformation has been repeatedly called out by researchers and public health officials. It’s all the more vexing because Facebook employs a small army of its own researchers and data scientists. Yet much of their work is never made public. “They have a really solid research team, but virtually everything that research team does is kept only within Facebook, and we never see any of it,” says Marwick, the UNC professor.

But much of Facebook’s internal research could help those outside the platform who are trying to understand the same questions, she says. “I want more of the analysis and research that's going on within Facebook to be communicated to the larger scholarly community, especially stuff around polarization [and] news sharing. I have a fairly strong sense that there's research questions that are actively being debated in my research community that Facebook knows the answer to, but they can't communicate it to us.”

The rise of ‘data donation’

To get around this lack of access, researchers are increasingly looking to “data donation” programs. Like the browser extension used by the NYU researchers, these projects recruit volunteers to “donate” some of their own data for research.

NYU’s Ad Observer, for example, collected data about ads on Facebook and YouTube, with the goal of helping them understand the platform’s ad targeting at amore granular level. Similarly, Mozilla, maker of the Firefox browser, has a browser add-on called Rally that helps researchers study a range of issues from COVID-19 misinformation to local news. The Markup, a nonprofit news organization, has also created Citizen Browser, a customized browser that aids journalists’ investigations into Facebook and YouTube. (Unlike Mozilla and NYU’s browser-based projects, The Markup pays users who participate in Citizen Browser.)

“The biggest single problem in our research community is the lack of access to private proprietary data,” says Marwick. “Data donation programs are one of the tactics that people in my community are using to try to get access to data, given that we know the platform's aren't going to give it to us.”

Crucially, it’s also data that’s collected independently, and that may be the best way to ensure true transparency, says Rebecca Weiss, who leads Mozilla’s Rally project. “We keep getting these good faith transparency efforts from these companies but it's clear that transparency also means some form of independence,” Weiss tells Engadget.

For participants, these programs offer social media users a way to make sure some of their data, which is constantly being scooped up by mega-platforms like Facebook, can also be used in a way that is within their control: to aid in research. Weiss says that, ultimately, it’s not that different from market research or other public science projects. “This idea of donating your time to a good faith effort — these are familiar concepts.”

Researchers also point out that there are significant benefits to gaining a better understanding of how the most influential and powerful platforms operate. The study of election ads, for example, can expose bad actors trying to manipulate elections. Knowing more about how health misinformation spreads can help public health officials understand how to combat vaccine hesitancy. Weiss notes that having a better understanding of why we see the ads we do — political or otherwise — can go a long way toward demystifying how social media platforms operate.

“This affects our lives on a daily basis and there's not a lot of ways that we as consumers can prepare ourselves for the world that exists with these increasingly more powerful ad networks that have no transparency.”