BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

Fake News? Big Data And Artificial Intelligence To The Rescue

This article is more than 7 years old.

The impact of fake news on the recent election has focused public attention on this multi-tentacled and growing problem. Vast swaths of the population fall prey to such misinformation, while others struggle to discern unbiased truth from the morass of lies and distortions that surrounds us.

Experts recommend that we to follow basic principles of information hygiene to separate fake from real, including checking sources, looking for bad grammar and typos, and seeking out corroborating information. And top of the list: never believe anything you read on Facebook .

However, none of these techniques is particularly effective. The quantity of fake news is now reaching crisis proportions, and the problem is only getting worse. Furthermore, the challenge of misinformation reaches well beyond the realm of public discourse, impacting the core of business as well.

Fake News as a Big Data Problem

Among the many ‘V’s’ that characterize big data (volume, variety, and velocity being the most familiar), we have now the added challenge of data veracity. Fake news, after all, is in essence a big data veracity challenge. It doesn’t matter how well we move, process, or secure our information if our information is simply incorrect.

Even the definition of data veracity is surprisingly muddled. Common sense would suggest that information has veracity if it accurately represents the facts in question. Yet facts – or truth more broadly – are surprisingly hard to discern.

In their book Big Data for Dummies, Judith Hurwitz, Alan Nugent, Fern Halper, Marcia Kaufman define data veracity as follows: “How accurate is that data in predicting business value? Do the results of a big data analysis actually make sense?” Hurwitz et al. say. “Data must be able to be verified based on both accuracy and context.”

Patricia Saporito, Data and Analytics Thought Leader for SAP has this take on data veracity: “veracity isn’t just about data quality, it’s about data understandability,” according to Saporito.

And yet, fake news is clearly understandable, and an analysis of it is likely to make sense. Surely these characteristics aren’t sufficient. Clearly, we still haven’t drilled down to the essence of veracity – truthfulness.

In their book Veracity of Data: From Truth Discovery Computation Algorithms to Models of Misinformation Dynamics, authors Laure Berti-Équille and Javier Borge-Holthoefer place data veracity squarely into the context of data quality. “Veracity refers to several quality dimensions related to repairing data inconsistencies and fixing other data quality problems such as duplicates, missing or incomplete data,” Berti-Équille and Borge-Holthoefer explain. “However, the problem of estimating data veracity should be projected into a bigger picture where misinformation dynamics is modeled and understood.”

Misinformation dynamics, in fact, is where the big data concept of data veracity and the problem of fake news connect. We’re not simply talking about the accidental inaccuracies that make up the bulk of enterprise data quality efforts. On the contrary, fake news is intentional misinformation, and furthermore, it is dynamic.

Berti-Équille and Borge-Holthoefer then poke holes in the traditional approaches to dealing with such intentional misinformation. “A common strategy to evaluate the reliability of the sources is to take advantage of data redundancy, and rely on majority voting heuristic, which simply assigns a true label to data that are claimed by the majority of the sources,” the authors explain. “But this strategy is known to be error-prone, because it counts all the sources equally and does not consider source dependence or collusion.”

Anyone who has struggled with fake news feels this pain, as such misinformation propagates quickly. A single fake meme may appear on hundreds of web sites in a matter of minutes – a form of collusion that gives such misinformation undeserved credence, as it tends to swamp actual facts.

Artificial Intelligence to the Rescue?

Facebook, for one, realizes that fake news is a problem it has to deal with. But with billions of users and untold trillions of Facebook posts, it realizes that no amount of manual fact-checking will solve this problem.

Facebook CEO Mark Zuckerberg outlined some of Facebook’s efforts to combat fake news in a recent post. “Historically, we have relied on our community to help us understand what is fake and what is not,” Zuckerberg says. “We do not want to be arbiters of truth ourselves, but instead rely on our community and trusted third parties.”

Given how error-prone such strategies are, however, Facebook realizes they must raise the bar on fake news. The clear answer: artificial intelligence (AI). AI is familiar to the gurus at the social media behemoth, after all, as it is already a central component of Facebook’s offering, driving the order and priority of posts and ads on its site.

To combat fake news, Facebook uses AI to detect words or patterns of words that might indicate fake news stories, according to a recent Wall Street Journal article. Given mixed opinions about AI, however, Zuckerberg is circumspect about expressing Facebook’s plans for how AI will combat fake news. “The most important thing we can do is improve our ability to classify misinformation,” Zuckerberg explains. “This means better technical systems to detect what people will flag as false before they do it themselves.”

AI is able to learn behaviors based upon continually improving pattern recognition, so training a system to identify fake news based upon what sorts of articles people have flagged as misinformation in the past is well within the reach of today’s technology.

However, AI still has quite a way to go to actually solve the data veracity challenge. “It is incredibly hard to know the whole state of the world to identify whether a fact is true or not,” according to Richard Socher, head of Salesforce Research, in an article for Quartz. “Even if we had a perfect way to encompass and encode all the knowledge of the world, the whole point of news is that we’re adding to that knowledge.”

In other words, the problem isn’t simply detecting fake news, it’s identifying real news – that is, information that is true and unbiased, as well as being current. AI is not yet up to this task. “One possible approach is to have a system which would have a sophisticated and detailed understanding of the meaning of text—which is something that cannot be done yet today,” explains Ilya Sutskever, research director for OpenAI in the Quartz article.

AI-Driven Data Veracity in the Enterprise

Fake news is but one page out of the enterprise data veracity story. On another page: the identification of malicious phishing emails – especially the more sophisticated ‘spear phishing’ variety that target particular individuals – is one area where AI is making headway against misinformation.

The Internet of Things (IoT) is another area ripe for innovation in AI to combat misinformation. Such misinformation might conceivably be intentional, but the greater veracity challenge with the IoT is identifying miscalibrations among the multitudes of sensors that enterprises are rapidly implementing today.

To be sure, AI-driven data veracity is a work in progress, but one thing’s for sure: the more data we have, the greater the opportunity for misinformation, both intentional and accidental. Furthermore, the volume of such information has swamped our human ability to discern the truth. AI, therefore, is really our only hope – not just with fake news, but with data veracity overall.

Intellyx publishes the Agile Digital Transformation Roadmap poster, advises companies on their digital transformation initiatives, and helps vendors communicate their agility stories. As of the time of writing, none of the organizations mentioned in this article are Intellyx customers. Image credit: Philipp Rudloff, Jim Lipsey, and Jason Bloomberg.

Follow me on TwitterCheck out my website