How Data And Information Literacy Could End Fake News

At its core, the rise of “fake news” is first and foremost a sign that we have failed as a society to teach our citizens how to think critically about data and information. Take that email from a Nigerian prince offering to transfer you ten million dollars if you’ll just send him $10,000 to cover the wire costs. Enough people get that email each day and wire those ten thousand dollars that this scam continues in 2016. The Internet has globalized the art of the scam and the reach of misinformation, allowing a single tweet to go viral across the planet, sowing chaos in countries on the other side of the world from the person sending it.

At the heart of all such news is the inability to think critically about the information that surrounds us and to perform the necessary due diligence and research to verify and validate. In April 2013 when the AP’s Twitter account was hacked and tweeted that there had been an explosion at the White House that left President Obama injured, automated stock trading algorithms took the news as fact and immediately launched a cascade of trading activity that plunged the Dow Jones by more than 100 points in less than 120 seconds. Human reporters, on the other hand, simply picked up the phone and called colleagues stationed at the White House to inquire if they were aware of any such attack and were quick to refute the false information.

Such triangulation lies at the root of basic fact checking, yet few today go to such lengths when reviewing information online. How many countless memes have spread on Facebook falsely attributing a particularly poignant quote to someone in the news? During the 2016 election cycle, such memes were standard practice on both sides, with unflattering or damaging statements falsely attributed to both candidates. A quick Google search for the quote in question typically turned up in short order corroborating information either showing that the quote was a modification of an existing quote, was attributed to the wrong person, or was fabricated entirely.

Yet, when I ask an audience at one of my presentations to raise their hand if they fact check such quotes before sharing them online, I have yet to see a single hand. In fact, I frequently see journalists at top tier outlets misattributing famous quotes. This is an area where technology could in fact play a powerful role – imagine a browser plugin that automatically flagged quoted statements and factual statements in an article and conducted a quick online search to see if there was strong disagreement on who made the statement or the specifics of the factual statement. While this would not tell whether the statement/fact is false, it would at least flag contentious issues for readers to let them know there is disagreement.

Even the nation’s most respected newspapers face challenges when fact checking events in foreign countries as a result of a steeply declining foreign bureau footprint. Whereas in the past a top newspaper might have any number of staff permanently stationed in key countries around the world to report on events first-hand, today a protest march or terror attack is more likely to be covered by stringers or through remote reporting.

A top US paper reported in a front-page story earlier this year that the main refugee housing center in a European country had been burned to the ground in a xenophobic arsonist attack. Yet, a quick English language Google search turned up local coverage of the arson which stated that the fire had merely scorched a few siding shingles and was quickly put out, with everyone back inside the building shortly afterwards. Local coverage even included copious photographs showing just how minor the damage was. Yet, one of the most respected newspapers in the United States failed to perform even basic fact checking on its claims that the building was burned to the ground in a country where there is plenty of English coverage and thus no language barrier to impede verification. Again, this is an area where automated triangulation tools could play a great role in reducing such incidents.

Indeed, when I ask my audiences how many of them have turned to Google News and Google Translate to access local Nepalese press coverage about the latest developments in the nation’s recovery from the 2015 earthquake, I have yet to see a single hand raised. Today we have access to all the world's information, yet we take no advantage of that information to be more informed citizens of the world. Similarly, the majority of Americans’ understanding of Syria come from heavily mediated Western reporting, often brokered or expanded through stringers or statements from the various involved parties. Few Americans have visited Syria recently as disinterested parties to learn for themselves what conditions on the ground are like and to catalog play-by-play narratives of the war. Local sources here will not provide an unbiased view, but at the least will provide reporting from closer to the nexus of activity, offering a greater range of perspectives and reports to allow a reader to make a more informed assessment of local activity and to see it through local eyes.

While media and technology pundits have touted fully automated solutions that would simply read an article and flag it as “true” or “false,” the reality is much more difficult in that “fake news” is not black and white, it is a hundred shades of gray. In short, much of what we might label as “fake news” is actually a difference of opinion in how people of different backgrounds and experiences interpret a common set of information. Just as one person might find a statement hilariously humorous, another might find it deeply offensive – so too might two different people come to very different conclusions regarding whether a political candidate’s statements make him unfit for office or whether they are his main appeal.

Attempting to classify entire websites as “fake” or “truthful” is also problematic, as one plugin discovered when it mistakenly flagged a story about veterans heading to Standing Rock as false because of its use of a blanket domain-based blacklist, even though the story was actually correct and had been widely reported in other outlets. In a twist of irony, the well-regarded outlet Tech Crunch initially reported this as a new Facebook technology gone awry, before later correcting its story to note that this was a third party plugin and unrelated to Facebook in any way. Facebook itself subsequently banned the plugin before reenabling it.

Yet, a Stanford study published last month demonstrates the problem with the status quo of simply leaving fact checking to readers themselves and hoping things will get better as the younger born-digital generation takes over. Through a series of tests, the authors found that at every level of education, from middle school to high school to college students, digital natives found themselves unable to perform even the most basic of tasks of recognizing a news article from a paid advertisement or recognizing an editorial from hard news reporting. This is made even more difficult by the increasing fluidity and blending of these formats in the evolving world of journalism.

The notion of a magical technology that could instantly label every article on the web as “fake” or “true” is a false promise due to the hundred shades of gray that underlie how we interpret the information around us. Yet, technology could certainly help us understand the information environment around a topic of interest, seeing all of the different perspectives and statements being attributed to the event and allowing us to make more informed decisions about that information.

As noted earlier, a browser or Facebook plugin that automatically identified quotes and factual assertions from an article and compiled a list of all reporting on those quotes and statements would at the very least allow a reader to understand how contested those details are. For example, a rapidly spreading viral meme attributing a certain statement to President Obama this afternoon could instantly be flagged as actually being a quote by Abraham Lincoln from a century and a half ago. A climate change claim that temperatures have actually dropped by 20 degrees over the past century could show that this number comes from a single personal blog, while all remaining reporting and scientific journals report very different results. Or, in the breaking aftermath of a major terror attack, such a tool could draw together all of the conflicting reports of the death and injury toll to offer a better understanding of the extent of the attack as new information emerges.

Such an approach avoids the problematic approach of attempting to enforce a single label of “fact” or “fiction” that ignores those hundred shades of gray, but rather provides the tools to create more informed consumers of information.

Similarly, one could imagine a browser plugin that takes all of the news outlets reporting on an issue and visualizes them in a 3D graph. The X axis would position each outlet by how positive or negative its average coverage of the topic is, while the Y axis positions each outlet by how much coverage it affords the topic and the Z axis by how polarized/emotional its coverage is. From such a graph one can immediately segment highly partisan outlets from outlets adopting a more clinical reserved view. Of course, an outlet clinically covering a topic is not necessary any less misleading in its coverage than a highly emotional partisan one, but at the very least, it suggests that the author is at least attempting to be more detached in his or her coverage and writing in more journalistic style rather than editorial style. Such a graph can easily be automated through the use of various sentiment mining tools.

Similarly, a common approach used in fact checking is to assess the level of concrete detail in a report. An article that is filled with vague emotional language is more likely to be problematic than one filled with concrete details, such as quotes and precise numbers that can be verified and validated. An article reporting the results of a secret CIA report in which no details can be revealed is far more difficult to fact check than an article in which all of the facts are available for both verification and reputation assessment.

Putting this all together, we see that fake news exists because as a society we have failed to teach our citizens data and information literacy. As I’ve noted here before, I’ve seen senior policymakers make statements that numbers equate to facts and that data is truth. Yet, as the Stanford study shows, even digital natives who have grown up in the information-saturated online world do no better at discerning the credibility of information or even understanding the most basic concepts of what is a paid advertisement versus journalistic reporting. Suggestions like requiring programming and data science courses in school would certainly create more technically-literate citizens, but this is not the same as data literacy and the kind of critical and devil’s advocate thinking it requires. Technology is also not a panacea here, as there is no simple magic algorithm that can eliminate false and misleading news. Instead, to truly solve the issue of “fake news” we must blend technological assistance with teaching our citizens to be data literate consumers of the world around them.

More From Forbes

How Data And Information Literacy Could End Fake News