Why It’s Time To Uncover The Surprises Hiding In Our DNA

To prepare for future pandemics and find new cures to disease, we must determine how the human genome varies from one person to the next.

Johana Kroft for Noema Magazine
David B. Goldstein is a professor of genetics at Columbia University and the CEO of Actio Biosciences. He is the author of “The End of Genetics: Designing Humanity’s DNA” (2022).

Imagine you could walk into a crowded restaurant with a Star Trek-style tricorder and instantly decode the DNA of everyone present. You would certainly learn things about some people’s health and likely reproductive outcomes that they themselves did not know.

You could expect to find a person or two, for example, with 20-fold greater risk of late onset Alzheimer’s disease than the population average. And you would find a handful of people with mutations that doctors must inform people about if they are ever seen “incidentally” during routine clinical sequencing — the sort of mutation, for example, behind Angelina Jolie’s decision to undergo prophylactic surgery to reduce her chances of cancer. You might very well find two patrons at the bar who carry mutations that, if combined, would cause a severe genetic disease in their children. Whether permitted by the Prime Directive or not, you could easily imagine that many of those so scanned would like to know the result.

Part of this flight of fancy is not too far from our current reality. Modern genome sequencing machines work almost that rapidly. But we have no way to interpret much of what the resulting genome sequences would show. We are nowhere near being able to appropriately interpret all the variation to be found in even one restaurant, let alone all the human genetic variation on Earth. We all carry variants that could star in a scary story about looming disease. Some of these variants we know have real associations with disease risk, like alleles that strongly increase the chance of late onset Alzheimer’s disease or sudden death due to arrhythmias. Other variants, however, look nasty but are in fact “false positive” in the context of disease causation. Part of the challenge of contemporary genomics is determining which genetic differences really are going to make someone sick. The COVID-19 pandemic has got me thinking that we need a new human genome project to overcome some of this uncertainty. 

The historical Human Genome Project was completed two decades ago to great fanfare, but I would suggest it is misnamed. It was largely dedicated to determining what geneticists call a reference genome. Very approximately, this amounts to what is usually present in the human population at each of the possible three billion sites in the human genome. As such, a more appropriate name might have been the human genome reference project. (Of course, the name is set and not changing).

But it is important to put this largely technical effort into its appropriate context and move beyond the somewhat hyperbolic claims of its significance. Determining the reference genome is a critical starting point for a great many things we wish to do and know, but it is not an ending point.

It may finally be time to determine how the human genome varies from one person to the next and how that variation influences us all — an effort that truly deserves the label of the Human Genome Project, but that I will call here the Genome Variation Project. There have been nascent efforts in this direction, from various national biobanks to the recent All of Us Research Program in the United States, which is generating omic data on at least a million volunteers. But all existing initiatives are limited in scale, speed and ambition about what can be interested and returned. 

“Part of the challenge of contemporary genomics is determining which genetic differences really are going to make someone sick.”

I would not advocate for such a human genome variation project lightly, especially given the many ways the results of such a project could do harm: the privacy concerns it poses, the potential risks for discrimination and abuse. But the need might outweigh the risks. There are many ways to illustrate the potential benefits to what we might appropriately call global health resilience, but COVID-19 gives a particularly timely illustration.

During the early months of the pandemic, scientists throughout the world were racing to collect blood samples from patients, generate sequence data and search for differences that influence susceptibility to severe disease. A look back at the HIV-AIDS pandemic explains why. HIV enters cells by binding to its receptor, encoded by the CCR5 gene; about 15% of northern Europeans, however, carry a deletion in this gene, and individuals that carry two copies of this deletion are almost completely protected against HIV.

Luis López (Mallet) for Noema Magazine

A drug that inhibits this interaction was later shown to be effective therapeutically in HIV positive patients. Scientists, of course, wanted to determine ASAP whether something like this would provide us guidance in the treatment of COVID-19 — and an international team of researchers has been searching far and wide for people who are genetically resistant to the virus.

Now, more than two years into the pandemic, scientists are still generating data and running analyses, with definitive answers yet be to determined. But creating a database of genomic variation would allow such analyses to be performed virtually in real-time during future pandemics. Knowing both the genomic make-up of patients and their clinical presentations could also lead to great advances in precision medicine, my own area of work, which seeks to target treatments to the most important causes of disease in individual patients. 

Such an effort is easily within reach today in the U.S., not only technically but economically. By my back-of-the-napkin estimation, determining the sequence of all Americans might cost somewhere between $3 billion and $75 billion, depending on the extent of sequencing, plus a further $30 billion-or-so expense to set up the infrastructure necessary to house, interpret and make available such data. Compared to the $1.9 trillion the government invested in the American Rescue Plan, this would be a much smaller-scale undertaking.

These data would permit most new drugs to be tested in a fashion stratified by the genetic make-up of patients. It is striking that, even today, two decades after the completion of the human genome reference project, this is rarely done outside cancer research — even though we know that diseases like chronic kidney disease can be differentiated on the basis of distinct underlying genetic causes of disease. 

“Creating a database of genomic variation would allow analyses to be performed virtually in real-time during future pandemics.”

And what of reproductive genomics? Advances in stem cell biology and genome editing will eventually permit widespread engineering of the genomes of future generations of children, and this is likely to happen well before we know the consequences of those alterations. One way to increase our knowledge of the likely consequence is to study what has already happened naturally in the genomes of the approximately 8 billion people alive today. This could be done by sequencing the entire genomes of people (through so-called whole genome sequencing) or only the parts of their genomes that actually encode proteins (whole exome sequencing). The result would be a near complete annotation of the consequences of most single site changes in the human genome. This would also provide exactly the information prospective parents would need in order to determine whether they have mutations in the same gene that would cause diseases in their children preventable by pre-implantation diagnostics. 

The availability of the data would have a direct and dramatic impact on research, clinical care and many other areas of life. For example, if a new pandemic were to arrive, we could quickly find out whether the human genome provides guidance about ways to fight it. Patients treated for the new disease could simply provide a code to researchers indicating where their genomic information is stored, and their data could immediately be included in global genetic studies. Similarly, large data sets could be established, pairing health records with genomic data, so that companies testing new medicines could select patients based on both clinical presentation and the underlying genetic cause of disease. 

We could reasonably expect that accounting for disease heterogeneity in this way would have an important impact on success rates for clinical trials, the largest share of global research and development expenses in the pharmaceutical industry. With estimated global expenses for clinical research in the field exceeding $200 billion annually, even a modest improvement in efficiency would have a dramatic impact on total costs. And there would be many potential benefits to individual participants, too — from guidance about whether couples have mutations in the same genes that might cause recessive diseases like Tay Sachs, to information about risk factors for different common diseases and predicted responses to treatments.  

So what is there not to like? Unfortunately, a new Human Genome Variation Project would bring up numerous profound concerns. 

1) Triggering anxiety and unnecessary clinical interventions. Having analyzed many thousands of genomes, many from apparently healthy patients, I have seen firsthand how any genome could be used to scare someone out of their wits.    

Right now, we are fairly good at finding the genetic defects responsible for a genetic disease that has already presented in the doctor’s office. But we are not good at identifying mutations that will make a person sick in the future before they have been determined clinically to actually have any disease. For example, research at Columbia University showed that individuals who are not known to have kidney disease have an improbably high burden of mutations that would appear to cause familial forms of kidney disease. Challenges presented by these sorts of variants will complicate developing a science of preventive medicine based on genomic information. It also means that the health care system will react to genetic variants that are not in fact worth reacting to, at considerable cost and with attendant unnecessary anxiety. Importantly, however, large-scale genetic data is exactly what we need to get better at this kind of interpretation.  

2) Equity and privacy concerns. Given the complexity of interpreting genomes, those with the resources to attract the best expert guidance will clearly benefit most from widespread availability of genomic information. Also, it is difficult to believe that it will be possible to keep genomic information secure, meaning that we can expect information to get out and be used against people. Legislation in the U.S. currently protects against some, but not all, uses of genetic information, and it is possible that insurers or others could ultimately treat people differently based on their predicted genetic destiny. Alzheimer’s disease, for example, is costly for health care providers, and it is easy to imagine insurers seeking to charge a higher premium for individuals with risk variants if they were allowed — a step toward a potential genomic dystopia that should appeal to no one. It might be challenging for the legal system to create sufficient safeguards against this.

3) The potential creation of a global genetic underclass. Inevitably, many currently healthy individuals will have mutations with very harmful effects that cannot be prevented, raising concerns about the potential for discrimination. There is no question that some people’s genomes will “look” better than others. Some will have more mutations that cause recessive disease, more variants associated with hereditary cancers and other conditions, and some genomes will have fewer. While some of these differences may seem innocuous enough, some genetic differences could end up influencing career prospects and many other aspects of life. For example, there are already available scoring systems that correlate genome-wide patterns of common variation with performance on standardized tests, and many genetic differences are expected to shorten life spans. With genomic data increasingly available, eventually in many smart phone apps, the potential stigmatization based on genomic profiles is substantial.

To these concerns we might add an unfortunately contemporary sociocultural one: Many people may be unwilling to believe in consensus judgements about what genetic differences do and don’t matter, and they would be left adrift, uncertain whose advice to follow. The current generation of academic “experts” does not command the same respect in society as earlier ones, in part because they seem increasingly unwilling to admit to the substantial uncertainty that surrounds many fundamental questions that bear on policy decisions. Instead, many in the credentialed elite have emphasized the importance of “following the science,” a phrase so devoid of comprehension of how science actually works as to be laughable.  The academic elite should adopt a little humility and ask why so many non-elites are so damn skeptical. Admitting how often we get things wrong might go a long way to restoring confidence.

In addition, the interaction of substantial government money with academia has been good for big science projects and horrible for the culture of contemporary science. These two things go together, but the unfortunate reality is that many parts of academia have learned how to play the political game along with the best of them, with devastating consequences for both open inquiry and public trust. Genomics, with its heavy reliance on government money, is as buffeted by these insidious currents as other area of science. In the world of big government grants for sciences, academicians have learned to build narratives implying that outcomes of experiments are predictable and that money spent is always productive.  

“The need for a Human Genome Variation Project is clear. It’s time to start developing solutions to the ethical dilemmas that stand in the way.”

These misgivings, however, do not entirely dissuade me from the need and importance of a Human Genome Variation Project. Instead, they convince me that we need to urgently address them in order to make such an effort ethically feasible.

These considerations also persuade me of the need to implement a Human Genome Variation Project in a way fundamentally different from how we have undertaken other large partnerships. For one thing, I can see a strong argument for the effort to be grounded in the private sector instead of the public sector, given the extreme politization of science and the need to proceed despite profound uncertainty — not a specialty of government initiatives today.  

Perhaps the best model would be regulated utilities. Governments could, for example, establish clear guidelines that companies providing genomic infrastructure would need to follow. This would include restrictions on agreed upon usage of the data and access, and it would also include guidance on the quality of genomic interpretations returned to individuals. In such a framework, a consortium of companies could then generate sequence data and provide interpretations to voluntary participants. It could also allow participants to make their data available to hospitals, companies evaluating new therapies and researchers investigating infectious diseases and other conditions. Because of the return of genomic information to individuals — and the need to provide dynamic reinterpretation as society learns more about how genetics influences health and disease — companies could likely charge a subscription fee in order to provide evergreen genomics analyses to customers. We would need to be mindful, however, that requiring payment could contribute substantially to health-care and reproductive inequalities and would warrant careful mitigation strategies. 

If done well, this model could be both profitable for the companies that provide the infrastructure and beneficial to individual participants and humanity as a whole. It is worth noting that a number of U.S. companies today have more than enough cash reserves to sequence genomic data for all residents of the United States. For example, the 13 top S&P companies, including tech behemoths like Apple, Alphabet and Microsoft, collectively command over 1 trillion dollars in cash holdings.

The need for a Human Genome Variation Project is clear. We have the technology and the means. It’s time to start developing solutions to the ethical dilemmas that stand in the way.