
The World Wide Genome: Genetic Privacy in the Age of Big Data
Genetic testing offers opportunities to anticipate risks and improve human health
DNA sequencing has evolved from a costly to a much more affordable technique in biomedical research. The lowering of the costs of DNA sequencing has even outpaced Moore’s Law (the observation that the cost of computational power roughly halves every two years) as the price of sequencing the human genome fell from $US15 million in 2005 to less than $US700 by 2020.1 Genetic testing has consequently become more affordable to researchers, and accessible to citizens through various companies, spurring biomedical advances in identifying health risk factors and diagnosing diseases.
A prominent application of genetic testing is identifying BRCA gene variants that increase the risk of developing breast cancer up to fivefold.2 Similar risk variants exist for other cancer types and multiple inheritable diseases. The predictive power of determining genetic risk factors based on genetic profiling is increasing—thanks to a combination of large databases and new machine learning methods for bioinformatics. Big consortia like the UK Biobank collect a large amount of human genome sequencing data, which has been used by more than 2000 research teams, driving scientific discoveries in public health research.3 It is not far-fetched to expect more scientifically innovative and beneficial results from these studies soon. Modern technologies and computational power are bringing the dream of truly personalized precision medicine closer to reality.
The low cost and widespread benefits make the rapid adoption of genetic testing inevitable. Large databases and sufficient computing power to share and analyze results are readily available, enabling the rapid development of precision medicine and modern personalized healthcare. Further, because of the SARS-CoV-2 pandemic, providing genetic material for molecular diagnostics has been normalized. Even though testing for viral RNA and human DNA are different, new familiarity with terms like “RT-PCR” and “strain sequencing” will likely continue to drive genetic testing. Even before the SARS-CoV-2 pandemic, more than 26 million consumers had added their DNA to four leading commercial databases.4
Your genetic data is not really yours
However, insufficient regulation of collecting and handling genomic data raises ethical and privacy concerns. Genetic data poses unique challenges; for instance, unlike the information available on an individual driver’s license, genetic makeup can also provide information about family members. This means that it is not always possible to control and protect one’s own genomic data. For example, law enforcement agencies sometimes submit DNA results to online genealogy sites like GEDMatch, then use close DNA matches to construct family trees and identify a crime suspect even though the suspect has never taken a DNA test. While investigators have successfully prosecuted individuals found in this way, the practice of genetic genealogy research is imperfect and brings ethical and legal concerns.5 Based on citizen rights and protections against unreasonable searches, civil liberty advocates call for warrants to collect and use genetic data.6
Furthermore, direct-to-consumer companies like 23andme and Ancestry have created a billion-dollar market7 by offering opportunities for consumers to investigate their ancestry and receive personalized health reports. Unlike biobanks, these companies do not make their databases openly accessible to the scientific community, but they do publish their methods and findings in the scientific literature. The level of privacy is mostly self-regulated and varies among these platforms, with some opting to share data with third parties.
Insurance companies might want to access such genetic data to underwrite policies. Current legislation that protects citizens from discrimination based on genetics and genetic privacy laws vary drastically worldwide. For example, some countries have minimal legislation, allowing life insurance companies to underwrite policies based on genetic information, while other countries impose outright bans. Figure 1 illustrates the asymmetric regulatory landscape governing the use of predictive genetic testing by private insurers. Possible widespread access by governments and private companies to the most personal information of all, our genome, is concerning if its use is up to unilateral self-regulated decisions, including uses not yet envisioned.
Figure 1
Figure 1: Map illustrating the asymmetric regulatory landscape governing the use of predictive genetic testing by private insurers. The figure was created using R software8 from data sources.9 Credit: Ibon Santiago and Tobias Hoffmann.
Biases and underrepresentation
Genetic databases in biobanks lack diversity. 78% of genetic association studies have been performed in biobanks located in the Global North using samples from individuals of European descent.10 This bias has implications for predicting disease risk in ethnically underrepresented groups. To maximize genetic discovery and reduce health disparities,11 researchers should increase the proportion of sequenced genomes among underrepresented populations in genomic databases. Data collection capacity, storage, analysis, and legal protections vary across countries and hinder these inclusions. These challenges can be tackled by promoting tech transfer, standardization, and the creation of an independent global biobank as an international resource. The successful tracking of the evolution of SARS-CoV-2 and the unprecedented viral strain sequencing indicate that such international collaboration is within reach.
Emerging technologies and science diplomacy as solutions
In light of concerns including privacy, lack of diversity in genomic databases, and the abuse of data, we call for science diplomacy at the international level, to a) introduce new privacy-protecting technologies and b) establish a global regulatory framework to prevent the abuse of genomic data and ensure the representation of the diversity of the population.
Emerging technologies such as blockchain might offer solutions to store genetic data safely while preserving privacy.12 Blockchain is a digitalized public storage of data and keeps a record of data exchanges. It is a decentralized network that can allow the frictionless distribution of data. Its key features are transparency, the absence of a centralized third party, and the use of cryptographic keys to maintain the anonymity of users. Such an approach can bring researchers vast amounts of genomic data while allowing users to retain control of their data and monitor its usage. Companies such as Nebula13 are already using blockchain to encrypt genomic data and could set the standard for other genomic repositories, allowing individuals to own and license their genetic data. However, emerging technologies might also pose new security risks. The ubiquitous connectivity allows greater access to big data and increases the vulnerability of new and old technologies.
With biobanks located in different countries and genetics research crossing borders, there needs to be international protection to prevent discrimination and safeguard privacy and human rights. Efforts to align the vastly different laws on commercial genetic testing inside the European Union are ongoing, starting with the Council of Europe’s General Data Protection Regulation (GDPR) in 2018.14 The GDPR is one of the primary models for data privacy, however EU member states are going their own ways in implementing the GDPR into national laws.15 Another example in the US is the 2008 Genetic Information Nondiscrimination Act (GINA), which protects citizens against discrimination in employment and health insurance16 but not life or disability insurance. Both pieces of legislation are starting points but need to include more privacy provisions and uniform protections against genetic discrimination.
An international effort to create a global regulatory framework covering the use of personal genomic data is necessary. Such global joint action might also impact the diversity of genomic data positively. Building upon the 2003 International Declaration on Human Genetic Data led by UNESCO,17 a global approach led by science diplomacy can address this problem by encouraging the participation of underrepresented communities. Diverse biobanks can contribute to a better understanding of disease risk and improve targeted therapies. With the help of privacy-preserving technologies, society can fully benefit from genomic research while protecting individual genetic rights.
Endnotes
- Kris A. Wetterstrand, “DNA Sequencing Costs: Data from the NHGRI Genome Sequencing Program (GSP),” National Human Genome Research Institute, 2020, www.genome.gov/sequencingcostsdata.
- Karoline B. Kuchenbaecker, John L. Hopper, Daniel R. Barnes, Kelly-Anne Phillips, Thea M. Mooij, Marie-José Roos-Blom, Sarah Jervis, Flora E. Van Leeuwen, Roger L. Milne, and Nadine Andrieu, “Risks of Breast, Ovarian, and Contralateral Breast Cancer for BRCA1 and BRCA2 Mutation Carriers,” Journal of the American Medical Association 317, no. 23 (2017): 2402–2416; Anthony Antoniou, Paul D. P. Pharoah, Steven Narod, Harvey A. Risch, Jorunn E. Eyfjord, John L. Hopper, Niklas Loman, Håkan Olsson, O. Johannsson, and Åke Borg, “Average Risks of Breast and Ovarian Cancer Associated with BRCA1 or BRCA2 Mutations Detected in Case Series Unselected for Family History: A Combined Analysis of 22 Studies,” The American Journal of Human Genetics 72, no. 5 (2003): 1117–30.
- Clare Bycroft, Colin Freeman, Desislava Petkova, Gavin Band, Lloyd T. Elliott, Kevin Sharp, Allan Motyer, Damjan Vukcevic, Olivier Delaneau, and Jared O’Connell, “The UK Biobank Resource with Deep Phenotyping and Genomic Data,” Nature 562, no. 7726 (2018): 203–09.
- Antonio Regalado, “More than 26 Million People have Taken an At-Home Ancestry Test,” MIT Technology Review 11, no. 2 (2019).
- Gina Kolata and Heather Murphy, “The Golden State Killer Is Tracked Through a Thicket of DNA, and Experts Shudder,” New York Times, April 27, 2018, www.nytimes.com/2018/04/27/health/dna-privacy-golden-state-killer-genealogy.html.
- Heather Murphy, “Sooner or Later Your Cousin’s DNA is Going to Solve a Murder,” New York Times, April 25, 2019, www.nytimes.com/2019/04/25/us/golden-state-killer-dna.html; Alexia Ramirez, “Police Need a Warrant to Collect DNA We Inevitably Leave Behind,” American Civil Liberties Union, March 10, 2020, www.aclu.org/news/privacy-technology/police-need-a-warrant-to-collect-dna-we-inevitably-leave-behind.
- Kristen V. Brown, “23andMe Goes Public as $3.5 Billion Company with Branson Aid,” Bloomberg, 2021, www.bloomberg.com/news/articles/2021-02-04/23andme-to-go-public-as-3-5-billion-company-via-branson-merger.
- R Core Team, “R: A language and environment for statistical computing,” R Foundation for Statistical Computing, 2019, www.R-project.org.
- Jean-Christophe Bélisle-Pipon, Effy Vayena, Robert C. Green, and I. Glenn Cohen. “Genetic Testing, Insurance Discrimination and Medical Research: What the United States Can Learn from Peer Countries,” Nature Medicine 25, no. 8 (2019): 1198–1204; Christoph Nabholz and Florian Rechfeld, Seeing the Future? How Genetic Testing Will Impact Life Insurance (Swiss Re Centre for Global Dialogue, 2017), www.swissre.com/dam/jcr:2bccf1e2-eaa5-4ca2-a416-f6dedcebe9dc/Genetics_Seeing_the_future.pdf; Yann Joly, Charles Dupras, Miriam Pinkesz, Stacey A. Tovino, and Mark A. Rothstein, “Looking Beyond GINA: Policy Approaches to Address Genetic Discrimination,” Annual Review of Genomics and Human Genetics 21 (2020): 491–507.
- Giorgio Sirugo, Scott M. Williams, and Sarah A. Tishkoff, “The Missing Diversity in Human Genetic Studies,” Cell 177, no. 1 (2019): 26–31.
- Genevieve L. Wojcik et al., “Genetic Analyses of Diverse Populations Improves Discovery for Complex Traits,” Nature 570, no. 7762 (2019): 514–18.
- Dennis Grishin, Kamal Obbad, Preston Estep, Kevin Quinn, Sarah Wait Zaranek, Alexander Wait Zaranek, Ward Vandewege, Tom Clegg, Nico César, and Mirza Cifric, “Accelerating Genomic Data Generation and Facilitating Genomic Data Access Using Decentralization, Privacy-Preserving Technologies and Equitable Compensation,” Blockchain in Healthcare Today 1 (2018).
- Dennis Grishin, Kamal Obbad, and George M. Church, “Data Privacy in the Age of Personal Genomics,” Nature Biotechnology 37, no. 10 (2019): 1115–17.
- EU General Data Protection Regulation (GDPR): Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016.
- Fruzsina Molnár‐Gábor and Jan O. Korbel, “Genomic Data Sharing in Europe is Stumbling—Could a Code of Conduct Prevent its Fall?" EMBO Molecular Medicine 12, no. 3 (2020): e11421.
- United States Congress, Genetic Information Nondiscrimination Act of 2008, www.eeoc.gov/statutes/genetic-information-nondiscrimination-act-2008.
- UN Educational, Scientific and Cultural Organisation (UNESCO), International Declaration on Human Genetic Data, 16 October 2003, www.refworld.org/docid/4042241f4.html