Our society is facing an unprecedented crisis due to the recent COVID-19 outbreak that is putting sanitary systems in check all around the world. Recently, dozens of countries announced the shutdown of all non-essential activities for the next foreseeable future, and scientists are striving worldwide to find cures and vaccines able to stop the ongoing pandemic.
In these hard times, everyone should put their expertise at play to help in the fight against the virus. For Gabriele Sarti, a Data Science student at the University of Trieste and a young member of the Italian Association for Computational Linguistics (AILC), this meant exploiting his expertise in Natural Language Processing (NLP) to develop the COVID-19 Browser, a system leveraging state-of-the-art techniques in NLP to extract meaningful information and guide scientists towards a better understanding of COVID-19.
As of today, more than 32 000 scientific papers have been published by research laboratories worldwide on the topics of the new corona virus SARS-CoV-2 and the disease COVID-19. It is very likely that in such a large quantity of text a lot of useful information is lost, making our knowledge on the subject too sparse to be exploited to its full potential. COVID-19 Browser allows users to browse a large collection of those articles directly in their console, matching article’s abstracts with user queries formulated in natural language to delve deeper in our current knowledge of the subject.
The model underlying Covid-19 Browser is SciBERT-NLI, a cutting-edge language model trained by the American nonprofit AI2 on a corpus of 1.14M scientific papers and subsequently adjusted by Gabriele to be used for the retrieval task.
Links
- The code for the project is open-source and available here: https://github.com/gsarti/covid-papers-browser
- A brief description of the model used is available here: https://huggingface.co/gsarti/scibert-nli
- The paper collection used for the project is available here: https://pages.semanticscholar.org/coronavirus-research