Engineering Biology: Systems, Tools, and Technology

Jacob Oppenheim, PhD

Jacob Oppenheim, PhD

February 7, 2023

Technology in a biopharma company tends to grow by accretion rather than design. Tools and systems are brought in house as functions are brought on line. LIMS comes with the establishment of a lab, a compound registry with the first experiments with small molecules, a chem informatics tool when it’s time to start digging into SAR. Growth reflects staffing and capabilities — much as you don’t hire a medicinal chemist until it’s time to design small molecules, you don’t bring in the systems they would use until the function is present. At its heart, this is a lean theory of growth: acquire only what you need, piece by piece as you move through from research into development and beyond. At the same time, not designing a systems architecture upfront all too often ends up costing organizations far more than it saves.

Technologically, this approach is destined to fail. Each system must be installed, refreshed with data, and maintained. Without a coherent overall design, all of these costs are magnified, frequently in unexpected ways when data models do not interact well or key identifiers are not shareable across systems. Worse yet, there are commonly used software tools in biopharma that do not run on macs, or demand to be the central database for your organization, or cannot be used on the cloud and must run on premises requiring a server and dedicated IT support.

Culturally, this paradigm is corrosive. By placing systems downstream of staffing, functional area leads view the systems they use as their own and optimize accordingly. Silos are the inevitable result. While this can lead to short term increases in productivity as teams pick software they are most comfortable with, it imposes much larger long term costs. IT, Informatics, and Data teams are constantly playing catchup, trying to knit together some coherence to disparate systems across the company. Not uncommonly, it leads to fission with separate IT teams reporting to functional areas and not developing core technological capabilities. From a pure maintenance perspective alone, this creates dramatically higher resource and licensing costs.

Until recently, it was hard to avoid some version of this scenario. There was no easy way to have a central data store on the cloud to connect all your systems. The cost of compute and storage meant that separate data transformation pipelines would be needed for every single connection between systems. Unless you had a stellar architect early on to deliberately choose the systems and design their interconnections, a tangled web of systems, servers, pipelines, and dependencies was hard to avoid. Investing in architecture was an exercise in damage mitigation rather than value creation.

Much as with data tools and systems, as I’ve covered previously, this has changed with the advent of cheap cloud storage and compute. With a central cloud data store, systems that generate or record data can drop their results in a common data warehouse. Systems that use data for defined analytical tasks can draw from a common set of data for functional experts to use. Results can be pushed back to the data warehouse, enabling seamless, tech enabled, learning loops.

To give an example: a set of compounds, defined in advance, are synthesized in the lab and logged in the compound registry. Standard assays are performed in the lab and logged in LIMS from laboratory machines. Chemists examine the results, model the structure activity relationships (SAR), then design a new set of compounds based off of them. All of these processes and the scientists running them use a single source of fresh data, available to all. Experts can focus on the science and designing the next key experiment rather than waiting for IT to load the data they need into their system and scrying into its lineage.

How then should we pick and license technology? It begins with a common architectural thesis: a central data warehouse and all systems on the cloud. Nothing on prem. Immediately, this narrows the funnel of potential tools. Next, the choice of systems needs to be guided by user requirements. What the scientist needs is the best product that fulfills their requirements and the technological needs: its underlying infrastructure and brand are not their concern. From there, informatics and IT can identify a set of potential solutions that can be evaluated by all stakeholders.It is critical to avoid weighing technological imperatives against user familiarity. Just as users have their hard requirements, so do technology teams. The nice-to-haves can be triaged and balanced.

This approach requires cultural change, moving users from dictating tools to elucidating requirements, but is fundamental to success with modern technology. A focus on requirements and capabilities allows you to build systems strategy purposefully and connected end to end. Common elements can be shared and reused: a central data warehouse, long term storage and archiving, single sign on, and security etc. Every tool benefits and improves while costs are reduced through scale and shared patterns. A reduced maintenance and compliance burden means a more unified, focused, and capable IT and Informatics org.

Our goal as technologists should be a world where users focus on capabilities and the ability of tools to fulfill them in a delightful manner, allowing us to swap components, systems, and pipelines as needed to continuously improve. Then, we can properly enjoy the fruits of modern technology.

To subscribe to Engineering Biology by Jacob Oppenheim, and receive newly published articles via email, please enter your email address below.