Notes on Engineering Health, January 2020: Mendelian Randomization

Jonathan Friedlander, PhD
Geoffrey W. Smith

Jonathan Friedlander, PhD & Geoffrey W. Smith

January 31, 2020

Does moderate consumption of alcohol protect you from cardiovascular diseases? Does smoking actually kill you? Organizing clinical trials with randomized groups of drinkers, smokers, and teetotalers to answer these questions is neither practical, nor ethical. The search for causal inference when randomized clinical trials are problematic (and even when they are not) has long been a goal of epidemiological research. Taking advantage of the rise of Genome Wide Association Studies (GWAS) and the wealth of data they have produced, epidemiologists have created a clever hack known as Mendelian randomization (“MR”) to untangle correlation from causation.

Originally developed in the late 1980s, MR has seen its popularity increase steadily over the past twenty years. MR uses genetic differences (variants) to remove confounding variables—such as social or behavioral factors—from analyses by using a genetic proxy for environmental perturbations. A couple of well-studied cases illustrate the concept.

It was observed and published in the late 1990s (see here and here for examples) that moderate alcohol intake was associated with reduced cardiovascular risk in comparison with abstinence or with heavier drinking. In 2019, The Lancet published a study using data from more than 500,000 adults carrying two common variants altering alcohol metabolism, making them perfect predictors of drinking patterns. Instead of relying solely on questionnaires on alcohol consumption which are known to be suspect in their accuracy, this genetic approach allowed the authors to derive causal relations between alcohol intake and incidence of ischemic stroke, intra-cerebral hemorrhage, and acute myocardial infarction. The results were unequivocal:

"The apparently protective effects of moderate alcohol intake against stroke are largely non-causal. Alcohol consumption uniformly increases blood pressure and stroke risk, and appears in this one study to have little net effect on the risk of myocardial infarction."

Another striking example of the power of MR comes from a study that examined whether taking selenium supplements was protective against prostate cancer as many studies had suggested (see here and here for examples). While this hypothesis had been tested in a highly publicized $100 million clinical trial called SELECT that yielded a negative result, these findings were subsequently re-tested using MR. Examining genotype data for tens of thousands of men, the researchers found eleven genetic variants that were associated with naturally higher levels of selenium in the blood. From birth, these people had lived as if they were taking selenium supplements. The scientists then compared the incidence of prostate cancer in people with these variants to that in a control group without them. The results were consistent with SELECT’s conclusions at a fraction of the time and cost:

"Our Mendelian randomization analyses do not support a role for selenium supplementation in prostate cancer prevention and suggest that supplementation could have adverse effects on risks of advanced prostate cancer and type 2 diabetes."

While MR can be both powerful and efficient in seeking out causal relationships, the technique must be used judiciously. MR studies must be careful to ensure the underlying genetic data are of sufficient quality, and that the underlying mechanistic assumptions are sufficiently robust. Indeed, one key assumption in using this technique properly is that the genetic proxy used does not affect the outcome through an unknown mechanism of action.

While MR will not replace clinical trials altogether, this strategy along with other statistical tools can be beneficial in bringing down the time and cost of uncovering causal relationships. Want to test a hypothesis yourself using MR? An MR API created as a collaboration between the University of Bristol, the MRC, and Cancer Research UK is openly available.

Jonathan Friedlander, PhD & Geoffrey W. Smith

Digitalis Commons (beta release)
Digitalis Commons has launched a beta version of a new data service at is designed to help busy problem solvers in biotech / biopharma / medtech / healthcare / data science keep track of what is happening in their industry’s public sphere by intelligently aggregating news.  It does so by collecting links that are currently being shared on public forums by influential health thinkers.

We based the idea for on the Open Fuego software originally developed by Harvard’s Nieman Lab.

Coming soon: we are cleaning up our code and will publish our updates in a Digitalis Commons GitHub Repo.

Dart Grants
Apply for a quick, targeted $3,000 grant to develop a public good for better health. Application details at

To subscribe to Engineering Biology by Jacob Oppenheim, and receive newly published articles via email, please enter your email address below.