Notes on Engineering Health, September 2019: Intellectual Debt

The Digitalis Team

The Digitalis Team

September 30, 2019

“Technical debt” is a cost of doing business well known to software developers. The term implies that a future payment will come due by choosing a short-term (if pragmatic) implementation today, knowing full-well that the system will need to be refactored to allow for a more complete or scalable solution in the future.

Jonathan Zittrain, co-founder of the Berkman Klein Center for Internet & Society at Harvard University, recently wrote (shorter version here; longer version here) about a new kind of debt that the technical world is beginning to rack up, so-called “intellectual debt:”

"An approach to discovery—answers first, explanations later—accrues what I call intellectual debt. It’s possible to discover what works without knowing why it works, and then to put that insight to use immediately, assuming that the underlying mechanism will be figured out later. In some cases, we pay off this intellectual debt quickly. But, in others, we let it compound, relying, for decades, on knowledge that’s not fully known."

Zittrain points out that the ever increasing use of machine learning to make “theory-free” predictions (that is, predictions whose rationale we don’t understand, whether or not the prediction is correct) is a source of rapidly-mounting intellectual debt. Similar to technical debt, we are making short-term decisions to outsource judgement and control to systems that “just work.” Which is fine, until they don’t.

Zittrain identifies three ways in particular that which our intellectual debts could come due. First, the problem of adversarial examples manipulating deep learning systems. Zittrain’s first example of this problem is the alteration of a few pixels in a picture of a cat that leaves a human eye still seeing just a cat, but causes a sophisticated neural net to see with 99.99% surety a photograph of guacamole. In a less trivial example of this technique, a system designed to classify skin lesions as benign or malignant was tricked into making inaccurate medical judgements. In this scenario, malicious or even just inadvertent spoofing could lead to significant health risks.

The second problem called out by Zittrain is the compounding of intellectual debt caused by “the coming pervasiveness of machine learning models.” Here the issue is that data produced by one machine learning system is increasingly being used to train other machine learning systems. The potential for an unrecognized flaw in the initial system to propagate exponentially through all its connected systems leads to a problem known in engineering as cascading failure.

The final challenge created by accreting intellectual debt called out by Zittrain is that the tools of machine learning are equally applicable in the private sector and in academia. Historically, it has been the role of pure academics to pay off our intellectual debts by, as Zittrain notes, “backfilling the theory,” while industry has generally been happy to just apply the right answer (think of marketing a drug without a known mechanism of action versus doing the basic science to sort out how the drug actually works). While this only answers approach may work in the short term, the likely shift of support away from basic research risks not replenishing the seed corn necessary to make the next set of fundamental breakthroughs as our current understanding of various phenomena reach their limits. Zittrain cites a provocative essay from the field of protein folding in exploring this concern.

Zittrain ends the longer version of his essay pointing out:

"Most important, we should not deceive ourselves into thinking that answers alone are all that matters: indeed, without theory, they may not be meaningful answers at all. As associational and predictive engines spread and inhale ever more data, the risk of spurious correlations itself skyrockets. Consider one brilliant amateur’s running list of very tight associations found, not because of any genuine association, but because with enough data, meaningless, evanescent patterns will emerge. The list includes almost perfect correlations between the divorce rate in Maine and the per capita consumption of margarine, and between U.S. spending on science, space, and technology and suicides by hanging, strangulation, and suffocation."

Remaining vigilant to these issues, while continuing to harness the power of statistics and computation, is a requirement for making sure that our intellectual debts do not overwhelm our pressing need to continue to invest in the improvement of our health and healthcare.

Digitalis Commons
Digitalis Commons is a non-profit that partners with groups and individuals striving to address complex health problems by building solutions that are frontier-advancing, open-access, and scalable. Some of our current projects include:

Mobile Health App Dev Workshop
On September 12, 2019, Digitalis Commons in conjunction with Sage Bionetworks convened a group of experts at the New York Genome Center to explore the topic of conducting health research on mobile devices and wearables. Topics explored included obtaining reliable digital measurements, design, engagement and enrollment, and technical approaches for app development, all through the lens of the ethical, legal and social implications of this work. Video highlights of the event will be forthcoming on our website.

Dart Grants
Digitalis Commons offers quick, targeted $3,000 grants to individuals and groups working to develop public goods for better health.

To learn more about Dart Grants, visit digitaliscommons.org/dart-grants/.