Printer-friendly versionView a PDF of this page.
About the Authors

Jonathan LoTempio, BS, is a PhD candidate in the George Washington University Genomics and Bioinformatics program. His dissertation research is conducted at Children’s National Hospital in Eric Vilain’s laboratory. He previously served as a Scientific Program Analyst at the National Human Genome Research Institute and as a member of the Fast-Track Action Committee on Mapping Microbiomes of the United States Office of Science and Technology Policy.

 

D’Andre Spencer, MPH, is an epidemiologist who specializes in global health threats specifically in underserved communities. He is a staff scientist at the Center for Genetic Medicine Research at Children’s National Hospital where his work focuses on capacity building in health and genetic research in the Democratic Republic of the Congo and developing database infrastructure for multi-center translational research.

 

Rebecca Yarvitz is an international trade analyst in Washington, DC. She holds a BA in International Relations from the Elliott School of International Affairs of the George Washington University.

 

Arthur Delot Vilain is an independent high-school student with a focus on history, environmental sciences, and civil rights and social justice issues.

 

Eric Vilain, M.D., Ph.D., investigates questions at the intersection of science and society. He is the Director of the Center for Genetic Medicine Research at Children’s National Hospital, the Chair of the Department of Genomics and Precision Medicine at George Washington University, and the director of the CNRS International Research Laboratory “Epigenetics, Data, Politics”.  He is an advisor to the International Olympic Committee Medical Commission and Past President of the International Academy of Sex Research. Corresponding author (evilain@childrensnational.org).

 

Emmanuèle Délot, PhD, is a researcher and educator in developmental genetics and genomics. She is a professor at the George Washington University, and on the faculty at Children’s National Hospital. She serves as the founding executive coordinator for the Disorders/Differences of Sex Development Translational Research Network.

 

New Article

We Can Do Better: Lessons Learned on Data Sharing in COVID-19 Pandemic Can Inform Future Outbreak Preparedness and Response

The COVID-19 pandemic will remain a critical issue until a safe and effective vaccine is in global use. A strong international network exists for the systematic collection and sharing of influenza genome sequence data, which has proven extensible to COVID-19.1 However, the robust demographic and clinical data needed to understand the progression of COVID-19 within individuals and across populations are collected by an array of local, regional, federal, and/or national agencies, with country-specific, often overlapping mandates. The networks tasked with transmitting descriptive, disaggregated data have not done so in a standardized manner; most data are made available in variable, incompatible forms, and there is no central, global hub.

At the same time, data have been released and analyzed on Twitter, in blogs, and by news outlets before undergoing scholarly peer review in the race to learn about the emerging virus. Many journals have lowered their paywalls on COVID-19 papers in accordance with WHO guidelines for sharing data in public health emergencies. This has led to unprecedented access to knowledge for anyone, anywhere. Furthermore, in peer-reviewed journals, more papers that include the term “coronavirus” have been published in just the past six months than in the previous 70 years (Figure 1). Sorting through these papers to inform evidence-based policy is daunting. This wealth of new knowledge can be a curse: scientists and policymakers drown in the open sea of literature; unsure which papers are life rafts and which are flotsam.

In short, there is a need for improved international data sharing policies to support outbreak response. Previous successes and challenges in data sharing are instructive. We probed the national-level reporting on cases between April 1 and July 1, 2020 (evidence), efforts to share the pathogen’s genome sequence (science), and existing tools for assessing and disseminating publications (policy). Based on these observations, we suggest several structural improvements to the systems that facilitate knowledge sharing.

Figure 1

 Figure 1

Figure 1. Publications with the term “coronavirus” archived in PubMed over time. Since the start of the pandemic, more papers have been published on coronaviruses than all previous years combined. This fact underscores the need for expert guidance in combing through and making sense of the literature. The emergence of the two previous human coronavirus outbreaks were followed by noticeable but much smaller increases in publications. The term coronavirus was published for the first time in 1968 (see “News and Views" in Nature, vol. 220, no. 5168, November 16, 1968, p. 650). PubMed indexes anterior reports of infections later demonstrated to be due to the same type of virus, likely via the use of keywords (the early publications are not accessible online via PubMed). Due to the size of the search query, the data underlying this figure were generated with Entrez Direct command line search.

 

Observations

Evidence: Heterogenous, internationally non-interoperable COVID-19 case reporting

We focused on the data reported by public health agencies of the fifteen nations with the highest COVID-19 burden.2  As of April 23, 2020, this list included the United States (US), Spain, Italy, France, Germany, France, the United Kingdom (UK), Turkey, Iran, China, Russia, Brazil, Belgium, Canada, the Netherlands, and Switzerland, which represented ~75% of the reported global cases. Features of the data reported by these agencies on April 23 can be examined in the Annex, along with a table updated as of July 1, 2020.3

Many languages of data reports: The many tongues of the global village are reflected in the ten unique de facto or national languages in these fifteen countries. While summary data are reported in English by all nations except Iran, China, Brazil, and Russia, more detailed reports, where available, are typically in official languages. We used Google Translate to compare the official websites.

Heterogeneous data format: All fifteen agencies reported data in plain text, HTML, or PDF; eleven offered an interactive web-based data dashboard; and seven had comma-separated (CSV) data for download. Between April 1 and April 23, the US agency added a CSV download function and the Turkish agency switched from a CSV format to a Google product. None of these CSV files are compatible, and no codebook was appended for decoding. There is little to no documentation which records the locations where old data are archived or which URLs are stable. The diversity of these fundamental methods of distribution presents great challenges to accessing the data.

Data aggregation status: All agencies recorded cases and deaths of citizens due to COVID-19, while only nine nations reported the number of tests conducted within their borders. The number of tests performed can help shape evidence-based policy by offering a crude estimate of infectivity.

Only eight agencies reported on age and sex, yet both variables are known to have a strong predictive value on health outcomes. Age was reported categorically with different, incompatible age groups across agencies, thus limiting research potential.4,5 No agencies included socioeconomic status in their reports. Race/ethnicity/ancestry was largely ignored, presented only by the US after April 17.

Clinical aspects of the disease, such as comorbidities, symptoms, or admission to intensive care, were also unevenly reported. No agencies reported reinfection. Accounting for each of these factors is of key importance to the professionals looking to track, treat, cure, and prevent this disease.

Science: A robust system already exists for the rapid sharing of viral genome sequence data

The Bermuda Accord,6 outlined in 1996 at an International Human Genome Project meeting, was a progenitor of rapid data release protocols, with the express purpose of the betterment of society. This statement informed the HapMap data-sharing policies, which still underpin human genomics today.7 In turn, HapMap data-sharing policies inspired the founders of the 2006 Global Initiative on Sharing Avian Influenza Data (GISAID).8 GISAID further required that data users not only give credit to data submitters, but make maximum efforts to work with and include them in joint analyses on viral sequence data, further tipping the scales in favor of collaboration. This mandated sharing of not only data, but the benefits of research, has resulted in a paradigm shift which helps to put contributors from higher or lower resource settings on the same footing.

More than a decade and a name change since its conception, the nearly invisible infrastructure of the Global Initiative on Sharing All Influenza Data (not just avian influenza anymore), influences both research and actionable policy, from the selection of yearly flu vaccines and the close monitoring of emerging strains to the less tangible but equally important dimensions of diplomacy, including conflict resolution and trust building.9 During the 2009 H1N1 pandemic, there was only a 10-day period between the first human case identified in the US,10 and the deposition of a full influenza genome sequence in GISAID11 The GISAID team ensured that the repository was globally accessible by leveraging team members across multiple time zones, allowing for round the clock work.12 Since then, the system has dealt with both light and heavy flu seasons, accommodating the yearly influx of data and demand for access to those data.

Through these efforts, GISAID has demonstrated that career advancement and global cooperation are compatible, with little negative impact on researchers’ ability to make a name for themselves, a perennial concern in the open data world. By changing the default from closely held data and few-author publications to openness and rapid, multi-author publications, GISAID has helped influenza research blossom.

GISAID has also shown remarkable robustness in the face of the COVID-19 pandemic. Since the first two SARS-CoV-2 genome sequence submissions on January 10, 2020, more than 57,000 have followed (through July 1, 2020), deposited from every continent except Antarctica and most countries.13 These files are high-quality, deeply sequenced genomes with few errors, not just the results of nasopharyngeal swab samples. For comparison, the National Center for Biotechnology Information (NCBI, US National Institutes of Health) Genbank and European Nucleotide Archive only contain ~10,000 SARS-CoV-2 sequences each. China National GeneBank DataBase (CNGBdb) links directly to GISAID.14

The GISAID epiCoV database includes metadata which are largely genomics-focused: sequence of the virus in fasta (plain text) format; and sequencing platform and analytic tools used to generate the sequence. Metadata are organized by information about the virus clade, sample, institute, and submitter. These can be further annotated with team member names, geographic location of sample, sampling methodology, the age and sex of the case, and any treatment, if known. However, filtering the database to only include sequences from patients of known clinical status reduces the number to approximately 3,300. While each viral sequence submission includes a unique identifier, detail on whether multiple sequences from one individual are deposited is not readily available. Given the tens of thousands of available sequences, the majority of the metadata in the database is incomplete – just as with the public national dashboards described above.

Policy: an explosion of academic manuscripts underscores the need for curation and summary

The products of academic research are key for informing the design of new studies and policy. These products include two broad categories: preprints and journal publications. Preprints have little or no review prior to being posted, though ethical statements and declaration of competing interests are usually required for the protection of human and/or animal subjects. Journal publications undergo editorial review by the journal or platform, and typically peer review as well, where professionals in the field are asked to assess the scientific and scholarly merits of papers submitted to the journal.

In the 70 years between 1949 and 2019, approximately 14,000 papers referencing coronavirus of any kind were published and indexed by PubMed, a database maintained by NCBI (Figure 1). Since the beginning of 2020, approximately 17,000 more such papers have been archived.

Also, since the beginning of 2020, more than 6,000 manuscripts on coronaviruses have been uploaded to bioRxiv and medRxiv, two leading preprint servers. Recognizing the need to ensure open access to knowledge, the World Health Organization (WHO) created a preprint repository, COVID-19 Open,15 which had ~50 papers as of July 1, 2020. If papers deposited to COVID-19 Open cannot find suitable publication elsewhere, they will be archived by the WHO and become citable as working papers.16

While it is encouraging that the global research community has turned its attention to the COVID-19 crisis, and diverse opinions are important, not all papers are of equal merit or should be used to guide hypothesis design or public policy.

Findability, however, does not ensure quality. The publish-or-perish culture of academia is responsible for at least some of the failure of effective data sharing, even though the requirements of funding agencies promote data deposition in public repositories.17 Clinical and research data are often siloed, requiring further efforts to join them.18 Predictably, early COVID-19 papers have been retracted even from very high-profile journals,19 and reports of systemic quality issues are emerging.20 The Retraction Watch blog has created a special list for retracted COVID-19 publications.21

Discussion

The science-policy interface fosters a virtuous cycle where each field improves the other, wherein access to clear and rigorous science is responsive enough to inform policy yet stringent enough to advocate for the needs of the scientist. That interface must be made more robust for research on outbreaks, whether measles in the US, Ebola in the Democratic Republic of the Congo, or a novel swine flu in China.

The global scope and global threat of pandemics underscores the need for a transnational entity such as the WHO, which can convene experts and funders to improve the infrastructure of outbreak response. The WHO is currently the only global institution with the reach, trust, and power to bring together the parties needed to build these resources and act as their arbiter. Here we propose two discrete actions, informed by previous successes in the field, to improve the invisible infrastructure of outbreak response. These improvements will directly advance the Research Priorities (Objective 5) outlined in the 2019 WHO Coordinated Global Research Roadmap R&D Blueprint.22

Research would benefit from centralized repositories of data, both for the scale that they can achieve and for the inherent like-against-like comparisons that can be made with harmonized data. At present, the data used for display on each national dashboard are incomparable and their methodologies opaque. Closely linking the case data that underpins dashboards to pathogen sequence data could facilitate computational experiments and the ability to develop and test hypotheses. However, the files behind these dashboards are incompatible with each other. The WHO global dashboard is also incompatible with country dashboards, and does not disaggregate the data by sex or age, important risk factors. There is a lack of transparency in the methods used for collection and aggregation, which would help transition these dashboards to research and policy resources.

For most researchers, these possibilities are presently limited by a lack of access to the relevant datasets, which is unproductive and undemocratic, and which furthermore stifles innovation and discovery. One model for expanding access to fine-grained case report data (the sort usually sequestered in medical records) is the Medical Information Mart for Intensive Care (MIMIC-III) database at the Massachusetts Institute of Technology.23 Another is the European Union’s COVID-19 data portal, which links virus and host genome, protein, expression, and biochemistry data, but leaves out clinical and case data.24 Adopting a system of credit sharing similar to GISAID’s would ensure that regardless of resource setting, data depositors are incentivized and recognized.

The centralization of outbreak data would have positive effects for policy- and decision-makers as they address local and global problems. With access to comparable global data, they can make better-informed decisions. However, they also need access to curated summaries of the literature written on the data, which will provide context and clues to which studies are superlative and should be used as a basis for action. Compendia of this sort will inform best practices for international consideration on policy for measures to flatten curves or open economies.

The removal of paywalls to promote access is not a new concept. There is a strong history of advocacy for broad access to knowledge, whether in recent commitments25 to open access in public health emergencies or the Findable Accessible Interoperable Reusable (FAIR) Data Principles from 2016,26 the Budapest,27 Bethesda,28 and Berlin29 statements on open access publishing and knowledge from the early years of the millennium, or the creation of the preprint arXiv in 1991.30 However, access without guidance can lead to chaos, and potentially a source of misinformation in the public sphere.

During the pandemic, these previously inward-facing resources have experienced increased interest and scrutiny from journalists and the general public. Creating a platform where data produced in the laboratory, field, and clinic face in to researchers and with summary, dashboard-like information facing out to decisionmakers will improve the quality and utility of those data across fields. Sharing credit more equally between depositors and users or analysts will incentivize participation. Curating compendia for non-scientists will help bring the fields together; when they are paired with rapid peer review, the gap between new ideas and consumption by non-scientists across all fields is narrowed. Taken together, these data repositories and resources for high-quality summaries of impactful papers will help stimulate effective exchange at the science-policy interface.

Recommendations

Pandemics, as global problems, require harmonization across local contexts. Through better exchange of information at the science-policy interface, their human and economic price can be mitigated.

Broadly, we advocate that the WHO, under the auspices of Research Priority Objective 5,31 should take a lead role in fostering systems that aggregate case report data for outbreaks, modeling them after GISAID’s. We further advocate for the creation of metareview panels for papers related to the disease that would assemble and summarize strong research papers and preprints.

First, we propose a central repository for data from outbreaks, what that repository should contain, how to manage and perpetuate it, and possible limitations (Evidence & Science). Second, we explore the possibility for metareview and the creation of compendia of high-quality papers and summaries of those papers designed for a broad audience (Policy).

  1. Evidence & Science: Joint repository for case report and pathogen sequence data

Present assessment: National-level dashboards cataloguing the spread of COVID are dynamic. In our initial survey (April 23, 2020), dashboards presented limited data, but as of July 1, the data and visualizations were generally more mature and of higher quality.

Publicly available data on COVID-19 require harmonization across sex, age, location, and disease severity. These data are of paramount importance in design of clinical trials and dosage levels for vaccines and therapeutics,32 the formation of basic hypotheses,33 and policy elaboration and implementation.34 Without a standard minimum amount of information included and a unified repository for these data, researchers cannot assess the biological and demographic factors that affect health outcomes. The data are (largely) findable, but they are not accessible, interoperable, or reusable.

Infrastructure need: There is no single place to go for robust data on cases and the pathogen sequence. The resources that do exist are largely for expert scientists, but the need exists to span disciplines. This lack of infrastructure hinders our understanding of and our ability to respond to outbreaks. A repository, or set of repositories, with case reports and pathogen genome sequence data with an outward-facing dashboard designed for use by non-experts, and controlled access to the underlying data for researchers is needed.

Informed execution: The WHO Global Research Roadmap for COVID-19 provides an excellent, comprehensive of the research needs for this outbreak with an eye towards generalizable methodology, but little space is devoted to the mechanisms of data sharing.35

Rather than call on the WHO to build these platforms themselves, we suggest that it convenes stakeholders, who will themselves build consensus on the necessary data to be deposited into any repository and find funders to ensure stability of the resource, as is the case with GISAID. Stakeholders’ openness to ideas in less formal encounters, specifically at World Economic Forum meetings, were instrumental to building GISAID – this light-touch approach should be replicated and built upon by the WHO to build out similar systems for all outbreaks.

The stakeholders’ group should look to the genomics community and the NCBI Database of Genotypes and Phenotypes (dbGaP), a repository for personally sensitive or identifiable data.36 While often not user-friendly for submitters or accessors, it has helped to advance the state of the controlled access data sharing in genomics where some aspects of the data are individualized and identifiable. There, scientists eligible for an eRA Commons37 account may submit a project description to a Data Access Committee to access the data. eRA Commons is the United States National Institutes of Health investigator portal. This certification system is used to ensure a high bar of access for dbGaP and like resources. Similar controlled-access systems maintained by the European Bioinformatics Institute (EBI) and Chinese National Genomic Data Center also serve this purpose. Results of the parties convened could be reported with the same mechanism as the Global Research Roadmap and decision-makers could begin to form the public-private partnerships needed to address the agreed-upon gaps, needs, and challenges.

Data deposition: Support for centralized data sharing is growing and has a long history in genomics38 where it has improved the quality and scope of research. In our proposal for outbreak data sharing, it is reasonable to expect state agency data depositions to come first, with researchers incentivized to deposit data thereafter, either at the time of publication or as a stipulation of funding.

Long-term support: This repository will require commitment to digital and human infrastructure that will take time to build, but which will have numerous long-term benefits and improve scientific study at all levels. GISAID, born of a public-private partnership, has been supported by numerous public entities. In 2010, the German Federal Republic took over housing the resource. The US, European Union, and China all support their federal data repositories, which are open for global use. A model that leverages governments, philanthropy and, potentially, commercial sector users will be key for the long-term sustainability of future endeavors.

Limitations of centralized data repositories

Privacy. The need to maintain essential liberties while enacting short-term, explicitly temporary restrictions proportionate with the risk to society is essential.39 It has been argued that COVID-19 provides an ideal case to test exceptions to existing data protection frameworks, including GDPR (General Data Protection Regulation).40 To this end, multiple tiers of access to a central repository could serve the greatest number of people while also modeling responsible privacy by design solutions, where proactively and transparently through all phases of engineering, measures are in place to protect the privacy and sensitivity of data is the default.41

Hypothesis formation and evidence-based policy in the current pandemic rely on the accurate reporting of sex, age, and location data, which impact disease severity and control. To be useful in real time, one tier of data on these key aspects of the disease must face the public, while another tier of data remains private and available to researchers while reflecting local privacy laws. One dilemma is that the US HIPAA standard,42 which considers precise location and age after year 89 as personally identifying information. This information is critical to understanding this disease, yet reporting those data widely is irreconcilable with the current framework in the US. Furthermore, the heterogenous rule of law and standards of privacy across all nations will require rigorous debate over how to balance individual and collective needs. Where combined mobile device-based geospatial location data and contact tracing fits into this schema is unresolved, and warrants study in the bioethics and legal communities.43

Equity. Health care systems are functionally different44 in high-, middle-, and low-income countries - we have seen this firsthand in our work in genetic medicine and infectious disease outbreaks in Central Africa.45 The use of and access to clinic and hospital settings, where data on epidemics are collected, varies with resources but also cultural norms. Infrastructure must be operable in low-resource settings with incentives for local participation to ensure that data from low-resource environments are accurately collected.

Interoperable data collection may also not be a high priority for staff focused on patient care in clinics. Our experience in clinical team science46 shows us that the more comprehensive the form, the less all fields are used. International data interoperability has another fundamental dilemma: harmonize data in English, or leave data in language with concepts useful to its generators, but more challenging to use by its aggregators? Here, we err towards equitability of language use, trusting that, with constant dialogue between the WHO and its regional and country offices, data will be human- or machine-processable and thus, interoperable.

Accuracy. Data collection can be hindered by agencies with overlapping mandates. For example, the Italian agency responsible for disaster mitigation documents different aspects of the pandemic than the Ministry of Health.47 This redundancy likely improves robustness but hampers data tracking for observers. Furthermore, faultlessly incorrect or intentionally misleading data are as detrimental as missing or incompatible data. This belies the need for post hoc data cleanup.

Our experience in coordinating large, multi-sited consortia (in English, French, and Spanish) has shown the need to strike a balance between easy-to-report , easy-to-collect, and easy-to-use data.48,49 Furthermore, our experience designing prescriptive templates to report disease-specific, standardized metrics, which ensure high-fidelity responses but require extensive training and practice, inform our recommendation to strike a balance between more and less open-ended responses in reporting cases.50 Creating new formats for data reporting is inherently a community activity, but remember that a camel is a horse designed by committee. Stakeholders must keep accuracy and utility at the forefront of their discussions. Once again, the WHO should provide leadership and guidance in constructing these much-needed data formats.

  1. Policy: Transparency at the science-policy interface through expert curation

Present assessment: The viral growth of papers begs for comparison to be drawn to the outbreak. We would not suggest flattening this curve, as research and analysis of an outbreak that touches every corner of the earth are important. However, for scientists and policymakers alike, there is a need to sort the meritorious from the mere musings. Rather than more avenues for preprint publications, there is a need for curated libraries of science.

For maximum utility across fields, the glut of papers and preprints cannot simply be archived. High-quality items must be preserved and promoted with their systematic high-level expert summaries in a central hub. Those with preliminary, exploratory, or statistically underpowered findings must be handled carefully, and excluded where appropriate. Summary and archival of critical papers will serve policymakers and non-experts by bringing more arcane research to light. Rapid and centralized curation will help to stem the spread of misinformation and misunderstanding.

Infrastructure need: An endeavor of this ilk would require mobilizing wide-ranging expertise to assess the relevance of publications or preprints. One such strategy has been devised by a team at Johns Hopkins University who are offering metareview through their 2019 Novel Coronavirus Research Compendium (NCRC), which aims to provide “accurate, relevant information for global public health action by clinicians, public health practitioners, and policy makers.”51 Drawing upon PubMed and the preprint servers medRxiv, bioRxiv, and the Social Science Research Archive, as well as a suggestion box, they search for papers that best advance the state of COVID-19 knowledge to highlight and summarize.

On June 29, 2020, an initiative from MIT and UC Berkeley with a diverse editorial board of global experts launched a new journal, Rapid Reviews: COVID-19, “to combat misinformation in COVID-19.”52 “Promising” manuscripts identified in preprint archives by UC Berkeley’s COVIDScholar artificial intelligence tools will be subjected to expert review in a matter of days, and published with accompanying commentary. This model has the potential to revolutionize peer review as we know it and may become a model for all fields of research.

Such initiatives must be encouraged and globally funded, to ensure their independence and representativeness of high-quality research from across the world. For increased preparedness for the next pandemic, vetted compendia of the available literature should also be created for other viruses (e.g. Ebola, Zika), and other successive outbreaks. These metareviews should be housed in a central hub with easy access and promoted to policymakers.

Leverage expertise: Peer review relies on volunteer time from specialists in the field of the manuscript under review. Reviewers are contacted by editors on a case by case basis. For metareview and compendia, a more permanent group of volunteers or paid staff would be beneficial, activated as needed during outbreaks. These reviewers could be tapped and vetted by permanent staff, and offered compensation for their time, to ensure that even early- or mid-career academics could contribute. If reviewers had conflicts of interest, they would recuse themselves as usual.

Paradigm shift: Like peer review, curating and summarizing literature for a WHO-mediated compendium must become part of the job of the expert. This laborious, but significant, change at the science-policy interface will help increase the utility of open access knowledge. This work needs to be recognized by university promotion and tenure committees to ensure that experts tapped to do so can spend the necessary time and resources on this activity.

Conclusions

It is one thing to contemplate the needs of the world, and another entirely to know the world as it is - especially in a pandemic. COVID-19 has exposed the strengths and weaknesses of the many national and global systems of data sharing.53 When the doubling time of new papers is as fast as that of new cases of a virulent infection, it is not enough to have open access – guides and way finders are needed. As such, we propose solutions for better scientific data collection and dissemination. However, data availability only tells part of the story: resources must be effectively mobilized in order for policymakers to design the best responses.54 Many governments responded to the pandemic threat with protectionist action: by April 1, 2020, 68 jurisdictions had adopted export controls on everything from personal protective equipment to testing reagents and soap.55 We must work together to craft policy based in evidence, rather than protectionist fear if we are to win against COVID-19. While we have proposed solutions for data sharing in outbreaks and pandemics, a similar reflective analysis must be made by trade and policy experts. Efforts for data sharing and global policy design must be examined to improve flow of information and resources at the science-policy interface (Figure 2).

Figure 2

Figure 2

Figure 2. Graphic representation of the science-policy interface. Top and bottom text denotes flow of information as it cycles the scientific and diplomatic processes. In green, data inform the creation of new knowledge, which generates new data and improves old data. In blue, cooperation and transparency in decision-making combined with a barrier-free environment improve conditions to make better policy. These colors meet at the science-policy interface where facets of science and policy come together to inform each other, supporting an environment for evidence-based policy. In the COVID-19 pandemic, the flow of information across this interface broke down. In this manuscript, we describe structural improvements to the points illustrated by equilibrium arrows A and B. Formalized structures at these points will improve science and policy, while providing opportunity to ameliorate information flow at the science-policy interface. We call on the World Health Organization to act as convener to ensure effective, broad, and stakeholder-focused resources are created. At equilibrium arrows A, we propose the construction of a tiered-access data repository to promote the flow of information to scientists and policy decisionmakers. Tiered-access will provide scientists with granular, identifiable data while also giving policymakers important high-level data. Some part of these data should be made open-access to promote transparency, while adhering to privacy by design. At equilibrium arrows B, we propose the creation of metareview panels responsible for both the assessment of published literature and preprints, as well as the summary of those meritorious studies for non-science audiences.

Acknowledgements

We would like to acknowledge and express gratitude for the thoughtful comments and suggestions from the external reviewers and the Science & Diplomacy editorial board.

This work was supported by the A. James Clark Distinguished Professor of Molecular Medicine Endowment to Professor Eric Vilain, MD, PhD, A. James Clark Professor, Children’s National Hospital. 

Note: This article was updated on September 3, 2020 to account for a comment from the GISAID Secretariat. As of March 2020, the Chinese entity that links to GISAID is China National GeneBank Database (CNGBdb).

 

Endnotes

  1. GISAID, “Genomic Epidemiology of hCoV-19,” www.gisaid.org/epiflu-applications/next-hcov-19-app/
  2. For Iran, reports from the Islamic Republic News Agency were used in lieu of information from a public health agency.
  3. Johns Hopkins University and Medicine, “COVID-19 Map,” Coronavirus Resource Center (2020), https://coronavirus.jhu.edu/map.html
  4. R. H. H. Groenwold, O. H. Klungel, D. G. Altman, Y. Van Der Graaf, A. W. Hoes, and K. G. M. Moons, “Adjustment for continuous confounders: An example of how to prevent residual confounding,” Canadian Medical Association Journal vol. 185, no. 5 (2013): 401–406.
  5. Further analysis of these demographic considerations can be found in K. M. Kocher, A. Délot-Vilain, D. Spencer, J. LoTempio, and E. C. Délot, “Paucity and disparity of publicly available sex-disaggregated data for the COVID-19 epidemic hamper evidence-based decision-making,” medRxiv, www.medrxiv.org/content/10.1101/2020.04.29.20083709v1
  6. R. Cook-Deegan and A. L. McGuire, “Moving beyond Bermuda: Sharing data to build a medical information commons,” Genome Research vol. 27, no. 6 (2017): 897–901.
  7. “NIH Genomic Data Sharing Policy,” https://grants.nih.gov/grants/guide/notice-files/not-od-14-124.html
  8. P. Bogner, I. Capua, N. J. Cox, and D. J. Lipman, “A global initiative on sharing avian flu data” Nature vol. 442, no. 7106 (2006): 981.
  9. S. Elbe and G. Buckland-Merrett, “Data, disease and diplomacy: GISAID’s innovative contribution to global health,” Global Challenges vol. 1, no. 1 (2017): 33–46.
  10. Center for Disease Control, “2009 H1N1 Pandemic Timeline” www.cdc.gov/flu/pandemic-resources/2009-pandemic-timeline.html
  11. Elbe and Buckland-Merrett, “Data, disease and diplomacy.”
  12. L. Schnirring, “Pandemic reveals strengths of new flu database,” Center for Infectious Disease Research and Policy, www.cidrap.umn.edu/news-perspective/2009/06/pandemic-reveals-strengths-n...
  13. GISAID, “Genomic Epidemiology of hCoV-19,” www.gisaid.org/epiflu-applications/next-hcov-19-app/
  14. National Center for Biotechnology Information, “GenBank Overview,” www.ncbi.nlm.nih.gov/genbank/ ; EMBL-EBI, “European Nucleotide Archive,” https://www.ebi.ac.uk/ena ; "China National GeneBank DataBase (CNGBdb)," https://db.cngb/gisaid/
  15. V. Moorthy, A. M. H. Restrepo, M. P. Preziosi, and S. Swaminathan, “Data sharing for novel coronavirus (COVID-19),” Bulletin of the World Health Organization vol. 98, no. 3 (2020): 150.
  16. Ibid.
  17. “Data management - H2020 Online Manual.” https://ec.europa.eu/research/participants/docs/h2020-funding-guide/cros... “Dissemination and Sharing of Research Results | NSF - National Science Foundation.” https://www.nsf.gov/bfa/dias/policy/dmp.jsp; “NIH Sharing Policies and Related Guidance on NIH-Funded Research Resources | grants.nih.gov.” https://grants.nih.gov/policy/sharing.htm.
  18. C. V. Cosgriff, D. K. Ebner, and L. A. Celi, “Data sharing in the era of COVID-19,” The Lancet Digital Health, vol. 2, no. 5 (2020): e224.
  19. C. Piller, “Who’s to blame? These three scientists are at the heart of the Surgisphere COVID-19 scandal,” Science Magazine, June 8, 2020, www.sciencemag.org/news/2020/06/whos-blame-these-three-scientists-are-he...
  20. E. Xiao, “Red Flags Raised Over Chinese Research Published in Global Journals” Wall Street Journal, July 5, 2020, www.wsj.com/articles/chinese-research-papers-raise-doubts-fueling-global...
  21. Retraction Watch, “Retracted coronavirus (COVID-19) papers,” https://retractionwatch.com/retracted-coronavirus-covid-19-papers/
  22. World Health Organization, in collaboration with the Global Research Collaboration for Infectious Disease Preparedness and Response, “A coordinated Global Research Roadmap” Global Research Forum on COVID-19 (February 11-12, 2020) www.who.int/blueprint/priority-diseases/key-action/Roadmap-version-FINAL...
  23. Cosgriff, Ebner, and Celi, “Data sharing in the era of COVID-19.”
  24. “COVID-19 Data Portal,” www.covid19dataportal.org/about
  25. “Statement on data sharing in public health emergencies," February 1, 2016, https://wellcome.ac.uk/press-release/statement-data-sharing-public-healt... “Sharing research findings and data relevant to the Ebola outbreak in the Democratic Republic of Congo,” May 22, 2018, https://wellcome.ac.uk/press-release/sharing-research-findings-and-data-... “Coronavirus (COVID-19): sharing research data,” January 31, 2020, https://wellcome.ac.uk/coronavirus-covid-19/open-data
  26. M. D. Wilkinson et al., “Comment: The FAIR Guiding Principles for scientific data management and stewardship,” Scientific Data vol. 3, no. 1 (2016): 1–9.
  27. “Budapest Open Access Initiative” https://www.budapestopenaccessinitiative.org/read.
  28. “Bethesda Statement on Open Access Publishing,” http://legacy.earlham.edu/~peters/fos/bethesda.htm
  29. “Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities,” https://openaccess.mpg.de/Berlin-Declaration
  30. P. Ginsparg, “It was twenty years ago today ...,” August 14, 2011, http://arxiv.org/abs/1108.2700
  31. WHO, “A coordinated Global Research Roadmap.”
  32. C. Tannenbaum and D. Day, “Age and sex in drug development and testing for adults,” Pharmacological Research, vol. 121 (2017): 83–93; S. Feldman, W. Ammar, K. Lo, E. Trepman, M. Van Zuylen, and O. Etzioni, “Quantifying Sex Bias in Clinical Studies at Scale with Automated Data Extraction,” Journal of the American Medical Association Network Open vol. 2, no. 7 (2019): e196700–e196700.; S. L. Klein and A. Pekosz, “Sex-based Biology and the Rational Design of Influenza Vaccination Strategies,” The Journal of Infectious Diseases vol. 209 suppl. 3 (2014): S114–S119.
  33. H. Li et al., “SARS-CoV-2 and viral sepsis: observations and hypotheses,” Lancet vol. 395, no. 10235 (2020): P1517-1520; R. Verity et al., “Estimates of the severity of coronavirus disease 2019: a model-based analysis,” The Lancet Infectious Diseases vol. 20, no. 6 (2020): 669–677.
  34. E. W. Colglazier, “Response to the COVID-19 Pandemic: Catastrophic Failures of the Science-Policy Interface,” Science & Diplomacy vol. 9, no. 1 (March 2020) https://sciencediplomacy.org/editorial/2020/response-covid-19-pandemic-c...
  35. WHO, “A coordinated Global Research Roadmap.”
  36. M. D. Mailman et al., “The NCBI dbGaP database of genotypes and phenotypes,” Nature Genetics vol. 39, no. 10 (2007): 1181–1186.
  37. “Overview of eRA Commons,” https://era.nih.gov/help-tutorials/era-commons/overview.htm?q=commons/in...
  38. Cook-Deegan and McGuire, “Moving beyond Bermuda”; “GenBank Overview”; EMBL-EBI, “European Nucleotide Archive.”
  39. American Civil Liberties Union, “Maintaining Civil Liberties Protections in Response to the H1N1 Flu,” www.aclu.org/other/maintaining-civil-liberties-protections-response-h1n1...
  40. S. McLennan, L. A. Celi, and A. Buyx, “COVID-19: Putting the General Data Protection Regulation to the Test,” Journal of Medical Internet Research Public Health and Surveillance vol. 6, no. 2 (2020): e19279.
  41. The 32nd International Conference of Data Protection and Privacy Commissioners, “Resolution on Privacy by Design,” 2010.
  42. Mailman et al., “The NCBI dbGaP database.”
  43. A. Hern, “France urges Apple and Google to ease privacy rules on contact tracing,” The Guardian, April 21, 2020, www.theguardian.com/world/2020/apr/21/france-apple-google-privacy-contac... T. Bradshaw, “2bn phones cannot use Google and Apple contact-tracing tech,” Financial Times, April 19, 2020, www.ft.com/content/271c7739-af14-4e77-a2a1-0842cf61a90f; Apple and Google, “Privacy-Preserving Contact Tracing,” www.apple.com/covid19/contacttracing
  44. Norwegian Refugee Council, “Just three ventilators to cope with Covid-19 in Central African Republic,” March 31, 2020, www.nrc.no/news/2020/march/just-three-ventilators-to-cope-with-covid-19-... R. Maclean and S. Marks, “10 African Countries Have No Ventilators. That’s Only Part of the Problem” New York Times, April 18, 2020, www.nytimes.com/2020/04/18/world/africa/africa-coronavirus-ventilators.html
  45. J. D. Kelly et al., “Neurological, Cognitive, and Psychological Findings Among Survivors of Ebola Virus Disease From the 1995 Ebola Outbreak in Kikwit, Democratic Republic of Congo: A Cross-sectional Study,” Clinical Infectious Diseases vol. 68, no. 8 (2019): 1388–1393; N. A. Hoff et al., “Evolution of a Disease Surveillance System: An Increase in Reporting of Human Monkeypox Disease in the Democratic Republic of the Congo, 2001–2013,” International Journal of Tropical Disease and Health vol. 68, no. 8 (2017): 1388–1393.
  46. E. C. Délot et al., “Genetics of Disorders of Sex Development: The DSD-TRN Experience,” Endocrinology and Metabolism Clinics of North America vol. 46, no. 2 (2017): 519–537.
  47. Dipartimento della Protezione Civile, “COVID-19 Italia - Monitoraggio situazione,” https://github.com/pcm-dpc/COVID-19.
  48. Kelly et al., “Neurological, Cognitive, and Psychological Findings”; Hoff et al., “Evolution of a Disease Surveillance System”; Délot et al., “Genetics of Disorders of Sex Development.”
  49. P. Speiser et al., “SAT-LB050 Congenital Adrenal Hyperplasia Newborn Screening Protocols Differ Widely in the US: A Survey by the Differences/Disorders of Sex Development Translational Research Network (DSD-TRN),” Journal of the Endocrine Society vol. 3, suppl. 1 (2019); E. Stulberg et al., “An assessment of US microbiome research,” Nature Microbiology vol. 1, no. 1 (2016): 1–7.
  50. E. C. Délot et al., “Genetics of Disorders of Sex Development”; E. Stulberg et al., “An assessment of US microbiome research.”
  51. “2019 Novel Coronavirus Research Compendium (NCRC),” https://ncrc.jhsph.edu/
  52. The MIT Press, “The MIT Press and UC Berkeley Launch Rapid Reviews: COVID-19,” MIT News, June 29, 2020, https://mitpress.mit.edu/blog/mit-press-and-uc-berkeley-launch-rapid-rev...
  53. Colglazier, “Response to the COVID-19 Pandemic.”
  54. R. Azevêdo and T. A. Ghebreyesus, “Joint Statement,” World Trade Organization News, April 20, 2020. www.wto.org/english/news_e/news20_e/igo_14apr20_e.htm
  55. A. Beattie, “Can this mad, global scramble for protective gear be avoided in future?,” Financial Times, April 20, 2020, www.ft.com/content/8a473d3a-3a2f-46c1-b1da-f555d50a8c92; Global Trade Alert, “Latest State Acts,”www.globaltradealert.org/latest/state-acts/page_513

     

     

    AnnexTable 1


    SuperscriptCountryLink         
    aUnited Stateshttps://www.cdc.gov/coronavirus/2019-ncov/cases-updates/cases-in-us.html      
    bUnited Stateshttps://www.cdc.gov/coronavirus/2019-ncov/cases-updates/cases-in-us.html      
    cUnited Stateshttps://www.cdc.gov/mmwr/volumes/69/wr/mm6912e2.htm?s_cid=mm6912e2_w      
    dUnited Stateshttps://www.cdc.gov/nchs/nvss/vsrr/covid19/index.htm       
    eSpainhttps://covid19.isciii.es/        
    fSpainhttps://www.isciii.es/QueHacemos/Servicios/VigilanciaSaludPublicaRENAVE/...
    gItalyhttps://www.epicentro.iss.it/en/coronavirus/bollettino/Infografica_15apr...      
    hItalyhttps://www.epicentro.iss.it/coronavirus/bollettino/Bollettino-sorveglia...     
    iFrancehttps://www.data.gouv.fr/fr/datasets/donnees-des-urgences-hospitalieres-...    
    jFrancehttps://www.data.gouv.fr/fr/datasets/donnees-hospitalieres-relatives-a-l...     
    kFrancehttps://geodes.santepubliquefrance.fr/#c=indicator&view=map2       
    lGermanyhttps://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Fallzahlen....      
    mGermanyhttps://experience.arcgis.com/experience/478220a4c454480e823b17327b2bf1d4      
    nGermanyhttps://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Situationsb...    
    oUnited Kingdomhttps://coronavirus.data.gov.uk/        
    pTurkeyhttps://covid19.tubitak.gov.tr/turkiyede-durum       
    qIranhttps://en.irna.ir/news/83753345/Coronavirus-death-toll-reaches-4-869-in...      
    rChinahttp://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm#NHCApril01       
    sChinahttp://www.nhc.gov.cn/xcs/yqtb/202004/9ffacd69bc67476eb83a2776b8d8c70c.s...      
    tRussiahttps://covid19.rosminzdrav.ru/        
    uBrazilhttps://covid.saude.gov.br/        
    vBelgiumhttps://covid-19.sciensano.be/sites/default/files/Covid19/Derni%c3%a8re%...    
    wBelgiumhttps://epistat.wiv-isp.be/Covid/        
    xBelgiumhttps://datastudio.google.com/embed/u/0/reporting/c14a5cfc-cab7-4812-848...     
    yCanadahttps://www.canada.ca/content/dam/phac-aspc/documents/services/diseases/...    
    zCanadahttps://www.canada.ca/en/public-health/services/diseases/2019-novel-coro...     
    alphaNetherlandshttps://www.rivm.nl/documenten/epidemiologische-situatie-covid-19-in-ned...     
    betaNetherlandshttps://app.powerbi.com/view?r=eyJrIjoiMmM0NGQyMTctYWM3Ni00MmI3LTkwY2QtZ...   
    gammaSwitzerlandhttps://covid-19-schweiz.bagapps.ch/de-3.html       
    deltaSwitzerlandhttps://www.bag.admin.ch/bag/en/home/krankheiten/ausbrueche-epidemien-pa...    
    epsilonSwitzerlandhttps://covid-19-schweiz.bagapps.ch/fr-2.html       

    These are countries with the highest overall case counts on April 23, 2020 as collated by the Johns Hopkins University Center for System Science and Engineering. Each national public health website was queried on April 1, April 13, April 17, and April 23. As all websites reported the number of cases, a column was not created to document this metric in the table. In column 9, Severity, ICU stands for intensive care unit, the most common delineation of severe or critical cases. The Italian health ministry used a scale from asymptomatic to critical instead of hospitalization. Citations within the table document the URL accessed to find the information.

    Table 2

    SuperscriptCountryLink        
    aUnited Stateshttps://www.cdc.gov/coronavirus/2019-ncov/cases-updates/cases-in-us.html    
    bUnited Stateshttps://www.cdc.gov/coronavirus/2019-ncov/cases-updates/cases-in-us.html    
    cUnited Stateshttps://www.cdc.gov/mmwr/volumes/69/wr/mm6912e2.htm?s_cid=mm6912e2_w    
    dUnited Stateshttps://www.cdc.gov/nchs/nvss/vsrr/covid19/index.htm      
    eBrazilhttps://covid.saude.gov.br/       
    fRussiahttps://covid19.rosminzdrav.ru/       
    gIndiahttps://www.icmr.gov.in/       
    hIndiahttp://https://www.mohfw.gov.in/       
    iUnited Kingdomhttps://coronavirus.data.gov.uk/       
    jPeruhttps://www.gob.pe/coronavirus       
    kPeruhttps://covid19.minsa.gob.pe/sala_situacional.asp      
    lChilehttps://www.minsal.cl/nuevo-coronavirus-2019-ncov/casos-confirmados-en-c...    
    mSpainhttps://covid19.isciii.es/       
    nSpainhttps://www.isciii.es/QueHacemos/Servicios/VigilanciaSaludPublicaRENAVE/...\%20COVID19/Informe\%20n\%C2\%BA\%2021.\%20Situaci\%C3\%B3n\%20de\%20COVID-19\%20en\%20Espa\%C3\%B1a\%20a\%206\%20de\%20abril\%20de\%202020.pdf
    oItalyhttps://www.epicentro.iss.it/en/coronavirus/bollettino/Infografica_15aprile\%20ENG.pdf    
    pItalyhttps://www.epicentro.iss.it/coronavirus/bollettino/Bollettino-sorveglia...   
    qIranhttps://en.irna.ir/news/83753345/Coronavirus-death-toll-reaches-4-869-in...    
    rMexicohttps://coronavirus.gob.mx/contacto/      
    sPakistanhttp://covid.gov.pk/       
    tFrancehttps://www.data.gouv.fr/fr/datasets/donnees-des-urgences-hospitalieres-...   
    uFrancehttps://www.data.gouv.fr/fr/datasets/donnees-hospitalieres-relatives-a-l...    
    vFrancehttps://geodes.santepubliquefrance.fr/#c=indicator&view=map2     
    wTurkeyhttps://covid19.tubitak.gov.tr/turkiyede-durum      
    xGermanyhttps://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Fallzahlen.htm    
    yGermanyhttps://experience.arcgis.com/experience/478220a4c454480e823b17327b2bf1d4    
    zGermanyhttps://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Situationsb...   

    Nations were selected by their overall case count as collated by the Johns Hopkins University Center for System Science and Engineering on July 1, 2020. Each national public health website was queried on July 1 and July 2 in the process of assembling this table. As all sites reported counts of cases, a column was not created to document this metric in the table. In column 9, Severity, ICU stands for intensive care unit, the most common delineation of severe or critical cases. The Italian health ministry used a scale from asymptomatic - critical instead of hospitalization. Citations within the table document the URL accessed to find the information. Note: all hyperlinks on both tables were verified and active at the time of submission, but may have expired since.