January 30, 2023
While this article wasn't created by AI, the image above was — by DALL·E 2 from OpenAI.
“If you work as a radiologist you’re like the coyote that’s already over the edge of the cliff but hasn’t looked down. We should stop training radiologists now as it is just completely obvious within five years deep learning is going to do better,” predicted Geoffrey Hinton, an eminent computer scientist and a pioneer of deep learning, at an AI conference in 2016. Was he right?
The subject du jour is undeniably the emergence and rapid progress of generative AI models such as ChatGPT. While a lot has already been written on the topic, both by humans and robots, the increasing role of AI in daily life, our investments, and healthcare at large compelled us to review some ideas about the technology, from the current successes with narrow AI to, perhaps eventually, Artificial General Intelligence (AGI) able to tackle any human task.
A useful distinction in the development of AI is between narrow AI and AGI. Narrow AI, also known as weak AI, is a form of artificial intelligence technology designed to perform specific tasks. Some well-known applications of narrow AI in healthcare are:
– Medical image analysis: machine learning algorithms can be trained to analyze medical images, such as X-rays and CT scans, to identify specific features, such as tumors or other abnormalities, with a high degree of accuracy.
– Electronic Medical Records (EMR) analysis: Natural language processing (NLP) algorithms can be used to extract information from EMRs, such as patient demographics, diagnoses, and medications, which can then be used to improve patient care.
– Drug discovery: AI can be used to analyze large amounts of data from chemical compounds, genetic sequences, and clinical trials to identify potential new drugs and predict their potential side effects.
Until recently, deep learning in healthcare has almost exclusively been unimodal, and primarily applied to the use cases listed above. These approaches use deep neural networks based on supervised learning from large annotated datasets to solve one task at a time. The value of using these focused models is becoming undeniable in almost all aspects of a patient’s journey through the healthcare ecosystem. The National Bureau of Economic Research published a recent report in partnership with McKinsey and Harvard University estimating an astounding $200 billion to $360 billion in cost-savings (representing 5 to 10% of US healthcare spending) if these types of narrow AI were more widely implemented.
More recent advances in narrow AI have expanded the list of actionable use cases and have debuted to widespread excitement, in particular ChatGPT which specializes in language generation and DALL-E which is an image generator. While still categorized as narrow AI, some understand large language models (LLMs) like ChatGPT and DALL-E to be a significant step towards a version of true AGI. These models operate with billions of parameters and are trained on broad data sets generally using undirected self-supervision that can be adapted to a wide range of downstream tasks. The main difference between these new generators and the older narrow AI models described above is their apparent ability to generate new output (to their users’ awe, delight, and sometimes terror) rather than just classifications of existing information.
These LLMs, also called foundation models, are multimodal, meaning they can operate with different types of data as input. They can store knowledge gained while solving one problem and then apply it to different but related problems (so-called transfer learning). The Stanford University Human-Centered Artificial Intelligence Center published a lengthy report on foundation models, and Eric Topol has explained their significance in healthcare in general and medicine in particular. These models have already passed medical board exams and increased their accuracy tremendously (close to human levels), all without being explicitly trained for any task in particular.
Foundation models for medicine will reach their full potential when they are able to integrate a full range of medical data that includes electronic health records, images, lab values, biologic layers such as the genome and gut microbiome, and social determinants of health to treat patients, discover new biology, reduce side effects from medication, accurately predict the risk of disease and many more applications.
However, the need for reliable, high-quality healthcare data and the high cost of computing associated with the current scaled LLMs present barriers to the rapid expansion of these models in biomedicine, but will also hopefully ensure they are properly trained and managed so as to avoid inappropriate biases and inaccurate results, particularly given the “black box” nature of so much of what happens inside these models.
Picking up these cautionary themes, the limitations of these models are quite fairly called out by people like Gary Marcus and it is important to be very mindful of them as these technologies move more and more rapidly into everyday use. The fluidity of these models, the authoritative tone of their responses, and, at times, the sheer beauty of the outputs tend to overshadow their flaws. These models are often unreliable, and if they are sometimes correct, they are so without actual understanding. They work by predicting the next word in a sentence, given the context of the words that came before it, and not by connecting words to meaning. In this sense, these machines are just producing bullshit, copying and pasting what is merely statistically the most likely word to be next in a sequence. Consequently, they will be as biased as their training sets (at this point essentially the entire corpus of the Internet) and because these algorithms operate as black boxes, no one is able to identify exactly where errors lie. A looming danger for these generative models is that output swings toward untruth or morally wrongness. One can fear a vicious cycle as an alternative to the rosier picture painted above.
While there seems to be a continuum between the narrow AI we know today and the AGI we hope (and fear?) for tomorrow, to actually cross this divide may require a whole new technology architecture. The expansion of the already impressive LLMs (soon to be built on trillions of parameters and trained with ever larger datasets) alone may not be able to solve for the problem of true machine comprehension, which might instead require an architecture differentially geared to understand the opaque weighting processes inherent in the current designs. The time frame to achieve such advances remains very unclear.
So in the present term, was Geoffrey Hinton correct? While radiologists have not yet, and are unlikely soon to be, replaced entirely by algorithms, it is fair to say at least that the radiologist plus algorithms are better than humans alone. And the narrow AI tools continue to improve at an astonishing rate.
Jonathan Friedlander, PhD can be contacted here, & Geoffrey W. Smith can be contacted here.
First Five is our curated list of articles, studies, and publications for the month. For our full list of interesting media in health, science, and technology, updated regularly, follow us on Twitter or Instagram.
1/ Bugs and Brains
In study after study, the importance of the microbiome in virtually every aspect of our health is confirmed. This study published in Science focuses on brain atrophy common in neurodegenerative diseases. It shows that manipulation of the gut microbiota resulted in a strong reduction of brain inflammation.
2/ When Do Humans Have Kids?
A new study published in Science revealed the average age when women and men become parents over the last 250,000 years. Using a new method based upon comparing DNA mutation rates between parents and offspring, evolutionary biologists showed that women's average age at conception increased from 23.2 years to 26.4 years, on average, in the past 5,000 years (fathers remained constant around 30 years). They found a large difference in generation times among populations, reaching back to a time when all humans occupied Africa.
3/ Better Than A Genetic Test? Metabolism.
A large analysis of the metabolic profiles of healthy American babies yielded some surprising discoveries. The study published in Molecular Genetics and Metabolism showed differences in metabolic signatures complemented known techniques such as DNA polymorphism and self-defined ethnicity groupings. This approach might help understand population movements and disease risks better as it blends genetic background and environmental factors.
4/ “Stop Eating” Signal
A new study published in The FASEB Journal might have uncovered why some people can’t just have one potato chip but must keep eating them. Researchers found that there was no difference in weight gain between mice without CRTC1, a specific transcription cofactor, in their MC4R-expressing neurons and control mice when they were reared on a standard diet. However, when given a high-fat diet, the CRTC1-deficient mice overate, becoming significantly obese and developing diabetes.
5/ Cool(ing) Windows
We’re departing from a pure healthcare-related topic for a study published in ACS Energy Letters on sustainability. By using advanced computing tools and AI, researchers designed a clear window coating that could lower the temperature inside buildings by blocking certain wavelengths entering the building while letting others exit. This technology used at scale could save significant cooling energy.
Public-Interest Technologies for Better Health
Digitalis Commons is a non-profit that partners with groups and individuals striving to address complex health problems by building public-interest technology (https://public-interest-tech.com) solutions that are frontier-advancing, open-access, and scalable.
If you are interested in getting a good primer on the current state of genomics and the continued progression of biology toward being an engineering discipline, this Introduction to Genomics for Engineers is a great place to start.