Skip to content

FIRST LOOK: Leveraging Machine Learning for Personalized Cancer Treatments

It is not obvious how decades of hard won biological knowledge can be incorporated into building machine learning algorithms for predicting how a drug will affect a patient. We study this problem from theoretical, computational, and practical angles.

Click here to watch Dr. Craft’s First Look presentation.

Our computational studies provide strong evidence that incorporating system knowledge, rather than using pure data driven machine learning approaches, offers substantial improvements to predictive accuracy. We describe open source machine learning datasets we have created for studying methods to incorporate prior knowledge and to assess the value in doing so. Boolean networks, graphs of interconnected logical (AND, OR) switches, represent complex dynamical systems and are used by systems biologists to model cellular signaling. We produce large random Boolean networks and compare machine learning methods which use knowledge of the underlying network connectivity with those that do not. We also use simulation models of biological processes (some taken from literature [e.g. a flowering time prediction problem], others created by our group [e.g. cellular response to DNA damage from radiation]) to generate datasets. We demonstrate in all cases that incorporating prior knowledge significantly enhances predictive capacity. Turning to real datasets, we present ideas and results for machine learning a cell line radiation sensitivity experiment. Prior knowledge here takes the form of expert gene selection, automated PubMed searches, Watson for Drug Discovery, and/or incorporating hierarchical biological information available at geneontology.org. We argue that such knowledge incorporation is critical given the “large p small n” regime we are in: p=number of (genetic) parameters, 100s of thousands, and n=number of samples we have, usually in the 100s. The personalized cancer medicine problem is in its infancy due to the complexity of human cancers. This talk will reflect on what makes this problem so difficult and will give evidence that multidisciplinary efforts to include existing biological knowledge are of vital importance to developing a high quality clinical prediction tool.

For more information about Dr. Craft’s research, please contact Partners HealthCare Innovation by clicking here.

Example technique to build in prior knowledge via detailed simulations. ML=machine learning.

Cellular response to radiation model, used to generate data for machine learning algorithm testing.

Results demonstrating superiority of prior knowledge (here called SimKern, for simulation-based kernel learning) machine learning. NN=nearest neighbor, SVM=support vector machine, RF=random forest, RBF=radial basis function. Accuracy is classification accuracy, R2 is coefficient of determination.