Knowledge is power against COVID-19 – here’s how AI is helping

There are 24,000 research papers and counting in the Covid-19 Open Research Dataset – AI can help researchers access the information they need.
19 March 2020

Collaborative AI projects might help researchers process relevant data much faster, and understand the coronavirus much better. Source: AFP

The COVID-19 outbreak has brought out the best of the expansive healthcare industry across the globe, which continues to work tirelessly to identify and treat cases and limit its further spread.

Information is power in the fight against the virus – and already thousands of records, findings and insights, have been published online – presenting a trove of information for scientists, healthcare specialists to refer to in the pursuit of treatments and vaccines.

There are now more than 24,000 peer-reviewed coronavirus research papers in the collaborative Covid-19 Open Research Dataset (CORD-19), a product of the Allen Institute of AI (also known as AI2), in partnership with industry juggernaut Microsoft and the Chan Zuckerberg Initiative.

It represents the most extensive collection of scientific literature related to the ongoing pandemic and will continue to update in real time. 

This data is immensely powerful – containing hidden patterns and trends that can help the scientific community fight it – but at the same time represents a deep ocean of data, that no-one, especially in the midst of an escalating crisis, has time to filter through manually.

Enter artificial intelligence (AI) tools, such as machine learning and Natural Language Processing; the technologies are being deployed to speed up resource-intensive projects, among other applications, by sifting through thousands of outbreak-related resources and making them available to researchers in a quicker and more discoverable manner.

Doug Raymond, the general manager of the Semantic Scholar academic resource upon which CORD-19 is built, believes that “AI has an important part to play in solving this problem.”

“The core problem is information overload in research,” he elaborated. “There are dozens of institutions that have published research on coronavirus […] Putting all the information together in a common format that is comprehensive is a huge challenge for researchers, and it’s a great application of our AI capabilities.”

AI2 has been working with Microsoft since 2018, broadening the role of Semantic Scholar which used to scour literary research material using NLP and machine learning tools. The AI-driven database is now capable of processing in excess of 182 million research papers from across the scientific disciplines.

The initiative, therefore, could prove critical in aiding researchers in the battle to understand and get ahead of the novel coronavirus from which COVID-19 is derived, as they can even share data mining tools and datasets to a peer-reviewed community of over four million scientists, as well as link the academic research with data from clinical trials and other sources of non-academic data.

This recent spirit of collaboration borne out of urgent need, might herald a new era of cooperative research between strange bedfellows.

The World Health Organisation (WHO) and a special standing committee of the National Academies of Science, Engineering, and Medicine helped formulate a series of key queries surrounding the epidemic that needed to be addressed.

Meanwhile, the NIH’s National Library of Medicine helped kick off the data gathering by allowing access to about 10,000 scholarly papers in their archives that dealt directly with coronavirus information.