FIRST LOOK: Leveraging AI and biological networks to functionally interpret vast genetic datasets

With the unprecedented ability to sequence genomes and to genotype millions of patients’ functional interpretation of genetic data is now one of the major bottlenecks in science and healthcare. This is a key constraint on our efforts toward biological understanding and patient-specific therapeutic intervention in human diseases. In parallel, high throughput technologies in transcriptomics and proteomics have provided insights into the wildly complex ‘social networks of genes’ by enabling us to measure the functional correlations, or physical interactions, between tens of thousands of genes in different tissues or cell types in a single experiment.  Interpreting genetic datasets by finding unexpected functional connections in such networks has emerged as a powerful computational approach to point to unexpected biology with therapeutic and diagnostic value.

Click here to watch Dr. Lage’s First Look presentation.

Towards these aims, we devised a computational framework to extract and quality control data from >40,000 scientific publications into a scored human protein–protein interaction network (InWeb_InBioMap). The network consists of >585,000 interactions between >17,500 human proteins and has better functional biological relevance than comparable resources [Li et al Nature Methods 2017]. We designed an artificial intelligence algorithm (Quack) that through a rigorous training procedure learns to identify unexpected pathway relationships in genetic data using InWeb or any other biological network defined by the user. Using such an approach, we analyzed genetic data from 4,742 cancer genomes to identify unexpected pathway relationships that predicted 62 new cancer driver candidates that had previously been missed. We found that our candidate genes induce tumors at rates that are comparable to those of known oncogenes, by developing a massively parallel experimental framework to determine in vivo tumorigenic potential in mice. To further confirm our predictions, and to establish their relevance in patients with an unknown cause of cancer, we reanalyzed nine tumor-inducing candidates in 242 patients with oncogene negative lung adenocarcinomas. We find that two (AKT2 and TFDP2) are significantly amplified in this patient group [Horn et al, Nature Methods 2018]. To make our approaches widely accessible we developed a unified web platform, GeNets (http://apps.broadinstitute.org/genets) where users can upload proprietary biological networks and genetic datasets, train Quack models and execute, store, and share network analyses of genetic datasets [Li et al, accepted, Nature Methods]. Overall, we illustrate the critical general value of a number of technologies developed in the lab to interpret large genetic datasets – a key bottleneck in healthcare. We furthermore specifically illustrate how our approaches can lead to new insights of therapeutic and diagnostic value in patients with lung adenocarcinomas.

For more information about Dr. Lage’s research, please contact Partners HealthCare Innovation by clicking here.

Figure 1. InWeb_InBioMap has better functional biological relevance than other comparable

resources.

Figure 2. Quack allows the identification of pathway patterns in genetic data.

Figure 3. Identifying new cancer driver genes in 4,700 cancer genomes.