Evo, an innovative AI model developed to generate genetic code, promises to transform bioengineering by advancing drug design, microbial reprogramming, and providing insights into genetic diseases.
Brian Hie, who leads the Laboratory of Evolutionary Design at Stanford, explores the intersection of artificial intelligence and biology. Recently, he posed a thought-provoking question: If a tool like ChatGPT can generate original sentences by analyzing patterns in vast collections of text, what would occur if we substituted written words with genetic code?
The answer to that seemingly simple question has resulted in Evo, a generative AI model designed to write genetic code. Hie and his colleagues at the Arc Institute and the University of California, Berkeley, presented Evo in a paper published in Science. Hie explains that researchers could use Evo to better understand microbial and viral genomes, create new proteins (such as drugs) that have never existed, and reprogram microbes for impressive tasks—ranging from enhancing photosynthesis for carbon sequestration and increasing crop yields to eliminating microplastics from the oceans (1✔ ✔Trusted Source
Sequence modeling and design from molecular to genome scale with Evo
).
Advertisement
Streamlining Bioengineering with Evo
“Instead of having to use brute force testing or mining promising sequences from nature, all of which are quite unpredictable, we now have an AI model for generating systems of interest, allowing researchers to focus only on the most promising possibilities,” said Hie, assistant professor of chemical engineering. “Evo puts the genomes of whole lifeforms within reach and accelerates the bioengineering design process.”
Evo could even lead to deeper understanding of evolution itself, new understandings of genetic diseases, and new treatments – all achieved on a computer rather than in a lab.
Advertisement
Nature’s Blueprint: DNA as Inspiration
The inspiration comes from nature itself. The instructions of all life are encoded in DNA. Better understanding of the complex interplay of DNA, RNA, and bioproteins – and how they have evolved over time – will lead to deeper knowledge and the ability to reprogram the microbes into useful technologies.
But all is not so easy as it seems. Even simple microbes have complex genomes with millions of base pairs. Two of Evo’s key advances compared to similar existing tools are expanding the length of sequences models can process at once from roughly 8,000 base pairs to more than 131,000 base pairs – known as the “context window” – and improving the resolution to the scale of individual nucleotides, the building blocks of DNA.
Advertisement
Ethical Safeguards: Preventing Misuse of Evo for Bioweapons
Evo was trained on the genomes of 80,000 microbes and 2.7 million prokaryotic and phage genomes, covering 300 billion nucleotides, as well as on smaller DNA loops known as plasmids. To preempt the use of Evo for the development of bioweapons, however, the team had to exclude the genomes of viruses known to infect humans and certain other organisms.
Evo is able to learn how small changes in nucleotide sequences affect the evolutionary fitness of whole organisms and generate DNA sequences of more than 1 million base pairs – more than seven times the context window of 131,000 base pairs, Hie added. By comparison, the smallest “minimal” bacterial genomes are about 580,000 base pairs in length, the researchers note.
Evo’s First Proof of Concept
As a proof of concept of Evo’s design capabilities, Hie and colleagues prompted Evo to generate novel synthetic CRISPR-Cas molecular complexes and systems. CRISPR-Cas systems are like tiny molecular machines that use proteins and RNA in tandem to edit DNA. In response to that prompt, Evo created a fully functional, previously unknown CRISPR system that was validated after testing 11 possible designs. Evo’s CRISPR exploration is the first example of simultaneous protein-RNA codesign using a language model, Hie noted.
Next Steps for Evo: Advancing to Complex Genomes
Next up, Hie is already working on expanding Evo’s ability to process larger genomic sequences as well as to achieve greater control over its outputs, as well as to broaden his research beyond the microbial world to human and other genomes.
“Evo opens up a lot of very interesting research at the intersection of machine learning and biology,” Hie said. “It creates opportunities for discoveries that were previously unimaginable and accelerates our ability to engineer life itself.”
Evo is open source and publicly available for interested researchers to download.
Reference:
- Sequence modeling and design from molecular to genome scale with Evo – (https://www.science.org/doi/10.1126/science.ado9336)
Source-Eurekalert