May 31, 2020
From the 1950s when the double helix structure of DNA was unveiled to the CRISPR editors of the past few years, progress toward being able to engineer biology consistently and at high fidelity has been constant. Our ability to treat biology as an engineering discipline may even be reaching the point where it can be tasked with solving problems such as the current coronavirus pandemic.
The ability to read DNA—uncovering its structure (1953), discovering restriction enzymes (1968), developing Sanger sequencing (1977) and sequencing the first human chromosome (1999), next generation sequencing (2010s)—and write it—creating recombinant DNA (1972), creating the first transgenic animal (1981), developing the polymerase chain reaction (1983), cloning the first mammal (1996), approving the first gene-targeted drug therapy (2001)—have progressed in parallel.
The last decade started with the creation of the first synthetic lifeform by Craig Venter and his team. The genome of this microorganism was neither evolved or born but entirely engineered. This milestone and the ones that followed launched the field of synthetic systems. Engineered life forms help deepen our understanding of biology—what is necessary and what isn’t? First these techniques lacked precision, then they lacked usability (while precision increased over time, achieving scale was still not possible). The advent of the CRISPR technique to edit genes (2012) by the team of Jennifer Doudna at Berkeley (and others) solved many hurdles. The new, lower bar to modify genes precisely and at scale opened two revolutionary tracks. The first, and the more exciting one for most, was the possibility to rewrite the genetic information in patients’s cells directly. This fresh new take on gene therapy is still in its infancy but shows tremendous therapeutic promise.
The second track is to generalize Venter’s approach to synthetic systems not only to create new life forms, but to generate new disease models, new metabolic pathways, and new drug targets. These synthetic systems can be designed to test virtually any biological hypothesis. A recent system aims to build a fully functional yeast-based synthetic genomics platform to genetically reconstruct diverse RNA viruses, including members of the Coronaviridae family. Thanks to this tool, researchers are able to rapidly functionally characterize SARS-CoV-2 evolution in real time, target the right epitopes and inch closer to a resolution to this health crisis and the ones to come.
– Jonathan Friedlander, PhD & Geoffrey W. Smith
In 2019, Digitalis Commons began exploring the mining of public data sets to identify patterns of value in biotech and related domains. Starting with the OpenFuego open source project that came out of the Nieman Journalism Lab, the Commons team moved on to create an entirely-new application that pulls public data from Twitter APIs and scores mentioned URLs to identify resources of interest to a given community.
That work had been running at synthesis.bio, an automatically-curated list of the most interesting web pages among a community of Twitter users with a strong interest in biotech. We've recently expanded the Synthesis project to support multiple channels around different topics, including Covid-19. That work is now at synthesis.digitaliscommons.org.
The project is maturing rapidly and Digitalis Commons plans to release it as an open source project later this year, after adding additional capabilities. The project is written in Python and makes use of the rich open source ecosystem for data projects, including Pandas, which is the ubiquitous toolset of manipulating columnar data in memory at high speed. The project runs in Docker containers deployed on Amazon's serverless cloud computing infrastructure, and operates at remarkably low cost — illustrating the extraordinary potential for the creation and operation of next-generation data analytics platforms in the cloud: cheap, fast, and good.
The Synthesis technology stack includes:
• Python services listening to Twitter feeds
• Docker containers running the Python code
• AWS Cloud Run running the Docker containers
• S3 buckets collecting the data in JSON files
• GatsbyJS generating a static site
• React running dynamic web pages
The Digitalis Commons Dart Grant program provides quick, targeted grants up to $3,000 to develop public goods for better health. Apply here.
Digitalis Commons is a non-profit that partners with groups and individuals striving to address complex health problems by building solutions that are frontier-advancing, open-access, and scalable.
To subscribe to Engineering Biology by Jacob Oppenheim, and receive newly published articles via email, please enter your email address below.