News

Fighting COVID-19 with Knowledge Graphs

Published May 31, 2020

The Convergence Accelerator Office in the Office of Integrative Activities at the National Science Foundation (NSF) has awarded two researchers at the San Diego Supercomputer Center (SDSC) funding to organize COVID-19 information into a transdisciplinary knowledge network that integrates health, pathogen, and environmental data to better track cases to improve analysis and forecasting across the greater San Diego region.

The award, funded under the NSF’s Rapid Response Research or RAPID program, is valued at $200,000 and set to run between mid-May and the end of October. It will allow the researchers to quickly launch a comprehensive semantic integration platform for data-driven analysis and development of policy interventions that take into account up-to-date health, social-economic, and demographic characteristics of populations in different areas as well as biomedical information such as virus strains and genetic profiles.

“The project will be based on our knowledge graph prototype linking information about pathogens, health data, and environmental indicators and enabling cross-domain inferencing,” said Peter Rose, director of SDSC’s Structural Bioinformatics Laboratory and principal investigator (PI) for the project, called ‘COVID-19-Net: Integrating Health, Pathogen and Environmental Data into a Knowledge Graph for Case Tracking, Analysis, and Forecasting.’ “Such a graph lets researchers trace the spread of the coronavirus in different geographic conditions, focusing on specific virus strains and transmissions.”

The COVID-19-Net grant  is one of 13 RAPID awards made by the NSF Convergence Accelerator to Track A: Open Knowledge Network-related projects. This project will be coordinated with another RAPID program led by Krzysztof Janowicz, a  professor of Geographic Information Science at UC Santa Barbara, with focuses on infrastructure resilience, supply chain disruptions, and local policy decisions designed to combat the pandemic.

Specifically, the COVID-19 Net project will refine the knowledge graph and integrate it with other complementary graphs being developed by the Open Knowledge Network (OKN) that was funded in September 2019 under the NSF’s Convergence Accelerator program. Tasks include refining methodology for populating the knowledge graph using Jupyter-based data-to-knowledge ingestion pipeline, continuously extending the knowledge graph content with additional data, and developing efficient data markup using recent schema.org COVID-19 extensions.

“The main goal is to make these datasets easier to find, index, and integrate,” said Ilya Zaslavsky, director of Spatial Information Systems Laboratory at SDSC and UC San Diego, and co-PI of the award. Zaslavsky also specializes in geoscience data discovery and develops visual exploratory gateways into advanced data science and machine learning tools, which will serve as one of the user dashboards for querying the graph.

“The NSF COVID RAPID awards program focuses on research that can be used immediately to explore how to model and understand the spread of COVID-19, to inform and educate about the science of virus transmission and prevention, and to encourage the development of processes and actions to address this global challenge,” said Lara Campbell, Program Director, Convergence Accelerator Office, NSF. “The project exemplifies how the NSF research community is able to respond to societal challenges in a timely manner, and also demonstrates the power of the “track” concept in the Accelerator, where independent projects collaborate as a cohort, which is enabling PIs Rose and Zaslavsky to leverage investments made by the NSF in the other Open Knowledge Network projects. We look forward to the valuable information and services that these efforts will be able to provide to benefit the American public during this pandemic.”

The project team has planned for continued collaboration with industry, university, and community efforts focused on COVID-19 data assimilation and analysis, including local government agencies in San Diego County, citizen scientists (Open San Diego), global initiatives (Graphs4Good), and industrial partners such as Microsoft and Neo4J. Together with many other UC San Diego experts who refocused their work on COVID-19, they regularly communicate with County health officials and researchers on issues critical for the County’s response to the pandemic, including tracing the spread of the infection and analysis of at-risk populations.

“The current pandemic has shown that we must all work together to overcome this crisis,” said UC San Diego Vice Chancellor for Research Sandra Brown. “This means bringing together the best minds in academia, industry, and the community, and it also means bringing together discrete data in unique ways to reveal a fuller picture of how SARS-CoV-2 is spread and how we can stem transmission. We are grateful to the NSF for their essential support in this effort.”

“I am pleased to join UC San Diego in announcing that federal funding will go to local researchers at the San Diego Supercomputer Center to help us combat the coronavirus pandemic,” said U.S. Representative Mike Levin (D-CA). “Since the beginning of this crisis, I have emphasized that we must listen to the scientists and the experts in order to beat this disease, and I am incredibly proud that UC San Diego researchers are playing a leading role in that effort.”

Added SDSC Director Michael Norman: “This project will result in a crucial resource among researchers dedicating their time and expertise to ending this pandemic by creating a data-driven analysis resource that can be widely shared to accelerate findings. It is among several COVID research initiatives that are currently using SDSC’s expertise as we provide priority access to our Comet supercomputer, coordinated through the recently announced COVID-19 HPC Consortium national alliance.”

Zaslavsky and Rose also offered a Directed Group Study course for data science seniors this quarter (DSC198), where several student teams developed additional components of the knowledge graph and created dashboards and markup that assist researchers in finding answers about the spread of COVID-19 as well asbiomedical and environmental aspects.

About SDSC

The San Diego Supercomputer Center (SDSC) is a leader and pioneer in high-performance and data-intensive computing, providing cyberinfrastructure resources, services, and expertise to the national research community, academia, and industry. Located on the UC San Diego campus, SDSC supports hundreds of multidisciplinary programs spanning a wide variety of domains, from astrophysics and earth sciences to disease research and drug discovery. In late 2020 SDSC will launch its newest National Science Foundation-funded supercomputer, Expanse. At over twice the performance of CometExpanse supports SDSC’s theme of ‘Computing without Boundaries’ with a data-centric architecture, public cloud integration, and state-of-the art GPUs for incorporating experimental facilities and edge computing.

Archive

Media Contact

Jan Zverina
SDSC Communications
(858) 534-5111

For Comment

Peter Rose

Ilya Zaslavsky

Back to top