In this article, I discuss why we should create a global, immutable registry of labeled ‘fake news’ powered by Artificial Intelligence, distributed ledgers, and a global community.
Fake News is a major problem in our connected world. Although misinformation and propaganda have been around for ages, ‘Fake News’ is now becoming a real threat, partly due to the ease of creating, diffusing and consuming content online.
What makes Fake News a hard problem to solve is the difficulty in identifying, tracking and controlling unreliable content. Even when there is early evidence for a fake story being circulated online, removing it or preventing people from sharing it could be perceived as an attempt of intervention and censorship.
People, websites, blogs, social media are all part of the problem to some extent— intentionally or not. False or misleading stories can be easily created and diffused via the global online networks with a few clicks – in many cases silently impacting the public opinion.
With the so-called ‘deepfakes’, it is already extremely difficult to tell if what you see is true or not: latest technologies enable hacking real videos or creating artificial ones, presenting people saying things they never did – in a very realistic way. Moreover, synthesized speech matching the voice of a known person can be used to claim statements or words never said.
The times when something was perceived as true just because it was ‘seen on TV’ or in a photo or in a video, are gone.
Fake News influences or even shapes public opinion and (re)sets the agenda. It is distributed by platforms and users - both intentionally and unintentionally. ‘Unintentional sharing’ is based on the general lack of awareness of the problem: people do not realize how often they are exposed to Fake News. They don’t know if they are influenced by misleading content, of if they are part of the problem itself by unintentionally sharing Fake News and influencing others.
There are ongoing efforts within news corporations and social media companies to mitigate the problem. And some of them may prove to be somehow effective. But the fake news problem is bigger — it goes beyond the corporate boundaries.
This article describes a global, immutable registry of labeled ‘fake news’ as the basis of a universal solution to the problem – on top of news organizations, social media and search engines. Utilizing technologies such as Blockchain, IPFS, and Natural Language processing the described platform can empower a global network of evaluators who establish continuous feedback, and who label and evaluate a representative random sample of our global content.
The objective of the system is to quantify the problem and raise global awareness by systematically taking snapshots of online content which are assessed and labeled by humans. Furthermore the platform can offer specialized APIs to expose the patterns and knowledge extracted from the on-going analysis of content in order to enable 3rd parties to predict the trustworthiness of new content – at ‘publish time’ or ‘share time’.
Fake news: Viral by design
Fake content is designed to be viral. Its creators want it to spread organically and rapidly. Fake stories are engineered to attract attention and trigger emotional reactions so users instantly share the ‘news’ with their social networks. With the right tricks and timing, a false story can go viral in hours. The ‘fake news industry’ takes advantage of the following ‘flaws’ or our online reality:
1. Our online world is set up for ‘attention and instant sharing’: The performance of the global ‘news distribution network’, including social media, news corporations, opinion leaders and influencers, is measured in terms of ‘attention’ and ‘user engagement’ – in many cases taking the oversimplified form of a CTR – Click Through Rate - and sharing statistics. A piece of content with high CTR will probably make it to the top of social feeds or stay more on the home page of a news site – regardless of how informative, trustworthy or useful it is.
With this approach in measuring performance, content with fancy photos and ‘over-promising titles’ do extremely well – regardless of the quality of the underlying story (if there is one). Very frequently, a fancy ‘promo card’ for an article with an impressive title is enough for people to start sharing with their friends and networks. This behavior can then lead to viral effects for content with no substance – or even worse, for content with false information and misleading messages.
Instead of the quality and trustworthiness of the content, 'expected performance' is what attracts attention and drives sharing on social media: websites and other online entities rush to reproduce stories that appear to be potentially viral. Then they promote them so they get more traffic and serve more ads, to achieve their ambitious monetization goals.
Content quality is rarely part of KPIs — at least not among the important ones: popular websites set goals on CTRs, page views, social sharing, and related metrics; and when there are complaints about poor content, they simply edit it or remove it.
2. Online users tend to ‘share a lot, easily’: Another aspect of the problem is this massive group of online users who act primarily as distributors/ re-sharers of content – without having the necessary understanding or even a genuine interest in what they share.
It is sad to realize that in an era characterized by instant access to world’s knowledge, the majority of the online users are ‘passive re-sharers’; they don’t create original content, they just recycle whatever appears to be trendy or likable, with little or no judgment and critical thinking.
Users of this class may consume and circulate fake news — and other types of poor content — and unintentionally become part of the fake news distribution mechanism.
A problem of quantification and awareness.
Obviously, there are entities who intentionally drive fake news - to achieve certain political, commercial or other goals. As mentioned above, there is also a massive group of online users (acting as individuals or on behalf of companies) who unintentionally participate in the exponential spread of false stories. In fact, due to a lack of understanding and awareness, many users will probably never realize that they are part of the ‘fake news system’.
Raising the global awareness around Fake News should be a key objective in every serious attempt to tackle the problem: instead of silently removing fake stories when identified and deactivating fake social media accounts, we must systematically measure the circulation of fake news across the globe, and the degree of unintentional participation of media sites and online users.
We need to understand the patterns and share the knowledge we get by continuously analyzing a representative sample of world’s digital content. We need to create a global registry of enriched content, analyzed and labeled by both humans and intelligent AI agents. It will be a global community powered by Artificial Intelligence on top of an immutable, unified content store. A platform powered by genuine human collaboration of spirit assisted by our most advanced intelligent technology.
The solution: A global registry of labelled Fake News
The proposed ‘Fake News Evaluation Network’ takes a different approach – it focuses less on the real-time classification of new content and more on a retroactive, large-scale ‘fake news’ analysis with the intent to quantify the problem, extract patterns, and share the derived knowledge. It puts emphasis on measuring the level of responsibility of each of the involved parties with the objective to educate, raise global awareness and influence the Corporate Social Responsibility strategies of online corporations.
Imagine a ‘content sampling’ process running on a daily basis — sampling the global content publishing and sharing activity. Powered by special crawlers, this process ‘listens’ for stories and ‘news’ across a representative set of major websites, social media, and popular blogs. It discovers and organizes ‘fresh content’ and ‘new content references’ into a unified, de-duplicated and immutable content store – specially designed to handle stories, facts, and their associations.
Newly identified content is unified and linked to its master copy’, related ‘stories’ and relevant factual information. It is then compared against the already labeled content, with the objective to estimate the ‘degree of deviation from reality’ using ‘fact-checked versions of the same story’ and known patterns.
Artificial Intelligence adds significant value by identifying the story in the content (the elements of a story such as the named entities, the events, the occasion, the timeline, etc.) and matching the variations found in a large pool of noisy content of various sources and levels of quality.
It then creates lists of stories that need to be evaluated and simplifies the assessment process through intelligent suggestions and recommendations (specifically what needs to be checked within each story).
The global community of professionals and ‘active digital citizens’ discovers, evaluates and votes for/against certain aspects of the story – with proper justification, inline references, and annotations.
As soon as a story gets enough votes and factual checks, AI generalizes the findings to all known variations of the story and different types of coverage – allowing quantification of the reliability of both the core story and its different instances. AI components pick the patterns and keep monitoring each core story for new facts and events that need to be checked. All these as part of the immutable content store – labeled content, assessments, publisher scores, and metadata permanently stored as part of a global history – no deletions, no ‘phantom’ fake news.
Social media, news corporations, blogs, and other entities consume the APIs of this platform to self-assess their compliance and progress towards a ‘better content for the world’ mission.
As content is being evaluated in terms of trustworthiness, the reliability of those who produce it, promote it, or distribute it is also affected: a ‘publisher evaluation system’ quantifies how website A or social media B or news corporation C is part of the global fake news problem.
Having this information, media entities can take action, learn and measure the level of their responsibility in spreading fake stories. They can let their users know that certain stories they have shared proved to be false and misleading. They can help the global effort by educating their users and demonstrating real social responsibility and meaningful actions towards a better-informed society.
Companies could also integrate special APIs in order to cross-check content at ‘share time’ and notify their users if the content is already flagged or there are signals for limited trustworthiness (while leaving the sharing decision to the user). Social media could notify users who have already engaged with ‘verified fake news’ stories (liked, shared, saved, commented on, or simply consumed) and explain how to avoid such content in the future.
There are countless interesting use cases — including measurements of additional aspects of content quality, global trend analysis and articulation of the dynamics of the phenomenon.
This could be based on a unified content on top of IPFS or Swarm — an immutable system hosting samples of the world’s content, unified, labeled and scored in terms of trustworthiness and other qualities of digital content.
Comments, thoughts and suggestions on particular technologies that could add value to this solution are welcome. Based on an idea posted on ideachain