Machine learning models can provide new insights into research data
//
Estimated Read Time:
Machine learning holds great potential for cancer diagnosis, prognosis, and prediction of response to therapy, but these promising deep-learning techniques have not yet become standard practice because of a lack of interpretability and transferability in translational medicine, according to Trey Ideker, PhD, University of California, San Diego.
Ideker discussed approaches to overcome these barriers on Monday, April 10, during the session Interpreting and Building Trust in Artificial Intelligence Models, which can be viewed on the virtual platform by registered meeting participants through July 13, 2022.
He outlined possible applications of visible machine learning and few-shot learning. The former includes interpretable machine learning approaches such as DCell, a deep neural network simulating cell structure and function that Ideker was instrumental in introducing in 2018.
“The idea of this research is not just to build a predictive system, but to build a system where you can open the box and inside, that system has been explicitly guided by knowledge of the molecular and cellular biology of cancer,” Ideker said.
Few-shot learning is an emerging type of transfer-learning based on the concept that knowledge acquired in one problem domain can be applied to solve related problems in other domains.
“The core idea is to train a model not for maximum accuracy in a single dataset, but for transferability from one context to another,” Ideker said.
This approach has proven effectiveness in linguistics. Someone who knows one Romance language, for instance, tends to understand a second Romance language more quickly than if they had no prior understanding of Romance languages.
In cancer research, data from drug response models of one tissue type can be transferred across other tissue types.
“Even more interesting is transfer from the cell-line screens into more clinically relevant contexts like patient-derived tumor cells,” Ideker said.
Eliezer M. Van Allen, MD, Harvard Medical School and Dana-Farber Cancer Institute, explained how he put the few-shot learning theory into practice in what he called the “convergence of biology and machine learning.”
Van Allen was involved in a Stand Up to Cancer project that aggregated about 1,000 tumor-normal whole exomes from men with prostate cancer, both primary and metastatic. A significance analysis across the entire cohort was performed with the hypothesis that more data on the lower frequency mutated genes in this group could lead to gene discovery.
“We have all these different genes and we couldn’t quite make sense of which ones really mattered, whether there are relationships between them, and how we can try to put this whole puzzle together in a more thoughtful way,” Van Allen explained.
The research team drew on techniques employed by Ideker in DCell and a Massachusetts Institute of Technology-based lab in antibiotic discovery.
“In both cases, they were, in essence, describing biologically informed networks that were using prior knowledge about what we know about biology and genetics, and interactions with signaling pathways, and using it to train models to ask questions in a hypothesis-driven way,” said Van Allen, whose team applied the concepts to cancer genomics. The insights derived from the machine learning helped to develop new hypotheses that they hope to bring into the clinic.
Su-In Lee, PhD, University of Washington, discussed how explainable AI (XAI) can be used to predict clinical outcomes and illustrate how a model prediction or inference was made.
“XAI can help us make new biological discoveries from data and may inform clinical decisions, and even open new research directions in biomedicine,” she said.
Lee’s recent work has focused on uncovering expression signatures of synergistic drug response using XAI.
Increasingly, cancers are treated with combination therapies, but choosing the optimal combination for an individual patient out of the tens of thousands of possible pairs of U.S. Food and Drug Administration-approved drugs is challenging.
“Even patients who have the same type of cancer may respond differently to the exact same drugs due to their particular genomic characteristics,” Lee said.
Datasets with the characteristics of different patient tumors, such as gene expression levels, and the characteristics of different combinations of drugs, such as their biological targets, can be used to train a machine learning model to predict how synergistic untested combinations of drugs will be for different patients.
However, XAI must evolve and improve to truly be able to solve real-world problems in computational cancer pharmacology, Lee cautioned.
“In genomic datasets, naturally features tend to be correlated with each other, which confuses models and feature attribution,” she said. “That means we need to be really careful in top-reading our results, even by using state-of-the-art explanation methods.”