Notes on Engineering Health, March 2022: Lampposts and Genomics

Jonathan Friedlander, PhD
Geoffrey W. Smith

Jonathan Friedlander, PhD & Geoffrey W. Smith

March 31, 2022

It is nighttime and a man is on his knees looking for his keys under a lamppost. A passerby stops and asks him if he thinks that’s where he lost them. The man on his knees answers: No but that’s the only place where the light is.

This famous joke is often used to describe the massive influence of technology on the way we see and interrogate the world. Although Watson and Crick famously solved the structure of DNA in 1953 from groundbreaking crystallography work by Rosalind Franklin and Maurice Wilkins, the ability to “read” or sequence DNA was not scalable until the invention of the Sanger sequencing method in 1977. Sanger’s innovation and the ones that followed slowly paved the way to today’s large-scale genome sequencing, with the current state of technology at each step largely determining the problems the scientific community was then able to focus on.

The Sanger method allowed the sequencing of up to 900 base pairs by replicating fragments of DNA many times over and tagging them with fluorescent proteins. One could then assemble all the fragments into the original sequence. The Sanger method was both revolutionary and extremely labor-intensive, which is why the Human Genome Project, completed in April 2003, took 13 years (1990 to 2003) and cost nearly $3 billion.

The successor to the Sanger method was Next Generation Sequencing (NGS) which instead of sequencing a single DNA fragment at a time runs a massively parallel process that  sequences millions of fragments simultaneously per run. The cost and time advantages of the NGS method (the market for which Illumina dominates with about 80% market share) have led to innumerable scientific discoveries, shedding new light on evolutionary processes and a great number of diseases caused by single-nucleotide variants, copy number variants, insertions, or deletions.

The fact that NGS like the Sanger method is based on short-read sequences (\<300 base pairs in the case of NGS) that have to be assembled, however, has caused the energy of the scientific community to focus on only a certain sub-set of genomic events, and likely has led to an underrepresentation of others events such as structural variants.  Accurately assembling ~300 base-pair fragments by overlapping them onto each other in order to build large sections or even whole chromosomes is a nearly impossible endeavor, especially for regions poor in information (GC rich and highly repetitive regions) — imagine trying to solve a large jigsaw puzzle representing clear blue sky with tiny pieces all of them a similar shade of blue. Indeed, reads less than 300 bases long, such as those typically produced by Illumina NGS machines, are too short to detect more than 70% of human genome structural variation (that is, variation affecting sequences longer than 50 base pairs), with intermediate-size structural variation (less than 2 kb) especially under-represented. Being able to identify, analyze, and eventually diagnose these structural variants will unlock whole parts of biology that were kept in the dark and that preliminary studies show to be important for health and diseases. But, this will only be possible with a different street lamp to cast light on this search space.

The search for this new “street lamp” — a cheap and scalable long-read sequencing technology — has been a decade-long effort. Companies such as Pacific Biosciences (on Illumina’s radar for an acquisition in 2018) and Oxford Nanopore can generate continuous sequences ranging from 10,000 bases to several million bases in length directly from native DNA. However, some challenges remain to the wide adoption of these technologies. A longer time to sequence, a need for more DNA per sample, and a prohibitive comparative cost make these approaches currently unfit to broad expansion beyond research and into the clinic. Solutions bridging the scalability of short-read sequencing and the output of long-read are emerging and incumbent companies are placing their bets on the best ways forward (a great review on long-read solutions can be found in Nature).

The short history of DNA sequencing is a wonderful example of the interplay of science and technology, with the current state of technology largely dictating what our science can ask and understand. In thinking about what we know, it is important to remember  the limits of what we can see.

Jonathan Friedlander, PhD & Geoffrey W. Smith

First Five
First Five is our list of essential media for the month which spans a range of content including scientific papers, books, podcasts, and videos. For our full list of interesting media in health, science, and technology, updated regularly, follow us on Twitter or Instagram.

1/ We Are Family
A massive new study has taken a major step towards mapping the entirety of genetic relationships among humans: a single genealogy that traces the ancestry of all of us.

2/ When I’m 64
A team of evolutionary biologists and biomedical researchers lay out evolutionary and biomedical evidence showing that humans, who evolved to live many decades after they stopped reproducing, also evolved to be relatively active in their later years. The researchers say that physical activity later in life shifts energy away from processes that can compromise health and toward mechanisms in the body that extend it. This guards against chronic illnesses such as cardiovascular disease, type 2 diabetes, and even some cancers.

3/ On The Dark Side
Ambient nighttime light exposure is implicated as a risk factor for adverse health outcomes, including cardiometabolic disease. A new study published in PNAS shows that exposure to moderate ambient lighting during nighttime sleep, compared to sleeping in a dimly lit room, can harm your cardiovascular function during sleep and increase your insulin resistance the following morning. This could have implications for those living in modern societies where indoor and outdoor nighttime light exposure is increasingly widespread and where concerns regarding cardiometabolic health are also on the rise.

4/ Don't You (Forget About Me)
Far from being some kind of decay, a new review argues that forgetting is actually an essential part of the learning process. The neuroscientists present a conceptual framework in which forgetting is considered to be an essential form of neuroplasticity that enhances the utility of memory to an organism.

5/ I’m Every Woman
Nonalcoholic fatty liver disease (NAFLD) is one of the leading causes of death worldwide. However, why premenopausal women are more resistant to NAFLD than men is currently unknown. A new study published in Nature showed that a hormone controls the production of a protective liver protein protecting female mice from NAFLD and steatohepatitis. Another recent paper identifies a potential way to battle the health effects of obesity and type 2 diabetes in women after discovering an important factor that could determine how their bodies use and store fat.

Digitalis Commons
Public-Interest Technologies for Better Health

Digitalis Commons is a non-profit that partners with groups and individuals striving to address complex health problems by building public-interest technology solutions that are frontier-advancing, open-access, and scalable.

Paul Farmer’s recent passing is a tragic loss for the global public health community, and for the many individuals for whom he worked so tirelessly. This New Yorker profile and the book Mountains Beyond Mountains by the same author are well worth reading to find inspiration in the massive impact a single truly dedicated individual can make.

To subscribe to Engineering Biology by Jacob Oppenheim, and receive newly published articles via email, please enter your email address below.