CRISPR – Machine Learning

August 29, 2017

By: Stephanie Allen

There’s been a great deal of buzz lately around the field of gene editing on topics that range from the next generation of precision medicine to major advances in computational biology. The gene editing applications under development are fascinating and positioned to make a broad impact on science and healthcare. These new technologies come with unique challenges, and moving past the obstacles has turned out to be just as exciting as the initial problems.

At the center of all the buzz is CRISPR/Cas9, a system borrowed from bacteria now considered the poster child of gene editing (Salsman and Dellaire 2017). Technology to modify specific locations on genomes has never been available before the discovery of the CRISPR/Cas9 system. The applications are broad and include medicine, agriculture, and the ability to control disease-carrying insects (Barrangou and Doudna 2016). CRISPR/Cas9 is the system of choice to modify DNA. Some of the most exciting prospects are in the field of medicine where potential solutions are being developed for health problems that previously seemed hopeless. Combatting antibiotic-resistant bacteria is one health issue with very few potential solutions. Studies show hope that gene-editing might be a good solution. Academic groups from Rockefeller University and North Carolina demonstrated that sequence specific antibiotics that use CRISPR/Cas9 effectively eliminate multi-drug resistant bacterial infections. This type of antibiotic is more potent than traditional broad spectrum antibiotics and is a better choice because it kills only the pathogenic bacteria (Beisel, Gomaa et al. 2014, Bikard, Euler et al. 2014).

Rare monogenic disease is another healthcare domain with unmet needs and few solutions on the horizon. Co-inventors of CRISPR/Cas9, Dr. Jennifer Doudna, a professor of Chemistry and Molecular and Cellular Biology at UC Berkeley and Dr. Emmanuelle Charpentier, director of the Max Planck Institute for Infection Biology have focused their research efforts on correcting disease-causing single gene mutations. Both are founders of companies that are developing therapeutics for monogenic diseases. Dr. Charpentier is a co-founder of CRISPR Therapeutics headquartered in Basel, Switzerland with satellite operations in Cambridge, Massachusetts and London. CRISPR Therapeutics has several drugs in their pipeline including treatments for hemophilia and Duchenne muscular dystrophy.

Dr. Doudna is a co-founder of Intellia Therapeutics located in Cambridge, Massachusetts. Like CRISPR Therapeutics, Intellia is also working on treatments that permanently edit disease-associated genes in humans using the CRISPR/Cas9 system. Intellia also has several rare disease treatments in their pipeline including a drug for Transthyretin Amyloidosis (ATTR). This disorder has varying frequencies depending on genetic lineage. West Africans have the highest percent effected at 5% of the population. Also in their pipeline is a treatment for Alpha-1 Antitrypsin Deficiency (AATD), which effects 1 in every 2500 Americans.

In addition to these in-vivo applications, Intellia is working on an ex-vivo approach to therapy that involves editing cells outside of the body. These programs are being developed through their eXtellia division. This therapeutic approach involves removing cells from patients, editing the cells using CRISPR/Cas9, expanding the therapeutically modified cells, and then returning the cells back to patients. Their ex-vivo focus is immuno-oncology and underserved autoimmune disorders.

Another area of focus for CRISPR/Cas9 therapeutics, that once developed, could undoubtedly save millions of lives is personalized medicine to treat cancer. Cancer diagnostics are more precise now that we have better genetic sequencing and less intrusive biopsy methods. This has contributed to greater clarity on the specific genes that are causing cancers. Better diagnostics in combination with the technology to edit specific sites in the genome will pave the way for personalized medicine in cancer treatments. A recent interview by McKinsey and Company captured this sentiment perfectly (1). Dr. Nessan Birmingham, who is founder and CEO of Intellia Therapeutics said:

One can envision a time in the not-too-distant future when a patient presents with a genetic disease. Her genome is sequenced, and a genome-editing drug is custom made, targeting her specific mutation. The patient is subsequently treated and potentially cured, in a cost-effective manner. The CRISPR/Cas9 technology has the potential to drive a medical revolution in the near future.”

The applications for CRISPR/Cas9 are broad and each have their unique challenges. The most common among them is off-target effects. In the early years of the CRISPR/Cas9 technology off-target effects were a major issue. Many researchers found that the guide RNAs that are essential to the system were problematic. Some issues included that the guide RNAs would not always go to the specific location in the genome that was originally intended, or in other words, they would be off-target. This lead to problems such as introducing mutations in the genome because genome editing would occur at an off-target site.  Alternatively, sections of the chromosome would translocate to an off-target location (Cho, Kim et al. 2014). At times the off-target sites would be edited at higher frequencies than the intended on-target site (Fu, Foden et al. 2013).

The off-target issues spurred a new area of research that included analyzing every feature of the guide RNAs and the effects of those features on the target genome (Hsu, Scott et al. 2013, Chari, Mali et al. 2015). Other studies analyzed the aspects of the target site on the genome (Doench, Fusi et al. 2016). The manual methods for doing these studies require trial and error, long hours in the lab, and a large research budget. Dr. Nicolo Fusi from Microsoft research is a leading researcher in the field of gene editing and summed it up nicely when he said:

Very few people have the expertise or the resources to do this kind of work.” (2)

The cutting edge CRISPR/Cas9 technology that was developed to overcome the off-target effects arrived when top researchers in the field working at the intersection of machine learning and gene editing started working together. Researchers now have tools that perform the proper analysis for them to help them optimize using the CRISPR/CAS9 system. There are two important aspects of CRISPR/Cas9 that scientists consider when trying to avoid off-target effects. 1) How to optimize the features of the guide RNA and 2) What is the optimal target sequence on the genome. Algorithms have been developed to help researchers answers these questions.

DeskGen, a company that provides software systems to automate science research developed a tool that helps researchers ask the first important question of CRISPR/Cas9 experiments: How to optimize features of guide RNA. They recently conducted a study to improve on how to optimize guide RNAs and gleaned valuable insights about applying machine learning to this question (3). They found that filtering guide RNA designs according to positive and negative predictors was one way to improve the CRISPR/Cas9 system. They also determined that a ridge regression machine learning model was better suited for this task than a neural network model because the neural network model gave less feedback about what it was doing.

Researchers use algorithms to optimize features of guide RNAs to help them choose the best guide RNAs. They don’t necessarily need to know why certain guide RNAs are better than others. However, they do prefer knowing those details. The ridge regression model better explained what features of a guide RNA contributed to it being the best guide RNA for any given experiment. They also found that the algorithm accurately predicted activity within a species, but became less accurate when applied to different species.  This suggests that building algorithms suited to a given species will be the way to go. Although many insights were made, the researchers felt the study was limited by the availability of CRISPR data to feasibly train and test new algorithms. DeskGen has established itself as an innovative company that designs tools for optimizing CRISPR/Cas9 system for gene-editing research. You can learn more about their technology in this interesting article published by BMC Bioinformatics (Hough, Kancleris et al. 2017) or check out their gene editing tools here (4).

Another group consisting of Microsoft Research and academics from the Broad Institute of MIT and Harvard performed an extensive study to evaluate what influences on-target and off-target activity in the CRISPR/Cas9 system. Part of what they studied involved testing the rules that are integrated into the algorithms that identify which guide sequences are optimal. They also looked at large sets of possible guide RNA gene targets and analyzed both guide and gene characteristics in various combinations. What they found was when they used improved on-target activity predictions and combined it with their off-target avoidance metric they could fully optimize the CRISPR/Cas9 system (Doench, Fusi et al. 2016). They developed new and improved rules and incorporated them into a web portal that was launched last year. Azimuth uses a machine learning algorithm to predict outcomes for guide-gene pairs that enable experimental design for gene editing where there currently is no experimental evidence available (5).

The hype around CRISPR/Cas9 technology may seem relatively new. But, the reality is CRISPR/Cas9 gene editing has been around for a decade and it is in these past few years that it has really reached its stride (Barrangou and Horvath 2017).


Barrangou, R. and J. A. Doudna (2016). “Applications of CRISPR technologies in research and beyond.” Nature biotechnology 34(9): 933-941.

Barrangou, R. and P. Horvath (2017). “A decade of discovery: CRISPR functions and applications.” Nature microbiology 2: 17092.

Beisel, C. L., A. A. Gomaa and R. Barrangou (2014). “A CRISPR design for next-generation antimicrobials.” Genome biology 15(11): 516.

Bikard, D., C. W. Euler, W. Jiang, P. M. Nussenzweig, G. W. Goldberg, X. Duportet, V. A. Fischetti and L. A. Marraffini (2014). “Exploiting CRISPR-Cas nucleases to produce sequence-specific antimicrobials.” Nature biotechnology 32(11): 1146-1150.

Chari, R., P. Mali, M. Moosburner and G. M. Church (2015). “Unraveling CRISPR-Cas9 genome engineering parameters via a library-on-library approach.” Nature methods 12(9): 823-826.

Cho, S. W., S. Kim, Y. Kim, J. Kweon, H. S. Kim, S. Bae and J.-S. Kim (2014). “Analysis of off-target effects of CRISPR/Cas-derived RNA-guided endonucleases and nickases.” Genome research 24(1): 132-141.

Doench, J. G., N. Fusi, M. Sullender, M. Hegde, E. W. Vaimberg, K. F. Donovan, I. Smith, Z. Tothova, C. Wilen, R. Orchard, H. W. Virgin, J. Listgarten and D. E. Root (2016). “Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9.” Nature biotechnology 34(2): 184-191.

Fu, Y., J. A. Foden, C. Khayter, M. L. Maeder, D. Reyon, J. K. Joung and J. D. Sander (2013). “High-frequency off-target mutagenesis induced by CRISPR-Cas nucleases in human cells.” Nature biotechnology 31(9): 822-826.

Hough, S. H., K. Kancleris, L. Brody, N. Humphryes-Kirilov, J. Wolanski, K. Dunaway, A. Ajetunmobi and V. Dillard (2017). “Guide Picker is a comprehensive design tool for visualizing and selecting guides for CRISPR experiments.” BMC bioinformatics 18(1): 167.

Hsu, P. D., D. A. Scott, J. A. Weinstein, F. A. Ran, S. Konermann, V. Agarwala, Y. Li, E. J. Fine, X. Wu, O. Shalem, T. J. Cradick, L. A. Marraffini, G. Bao and F. Zhang (2013). “DNA targeting specificity of RNA-guided Cas9 nucleases.” Nature biotechnology 31(9): 827-832.

Salsman, J. and G. Dellaire (2017). “Precision genome editing in the CRISPR era.” Biochemistry and cell biology = Biochimie et biologie cellulaire 95(2): 187-201.