AlphaFold: How AI is Used for Scientific Discovery

A study published online in an article, Nature, on 15th January 2020 demonstrated how researchers and experts discovered techniques using cutting-edge technologies to predict the 3D structure of a protein just by identifying its genetic sequence.

Before we delve deeper, let us first understand why we’ve been facing problems with protein folding for decades.

Protein Folding: Identifying the Problem

The shape of a protein is closely aligned with the way it functions, and the capability of predicting its own structure can solve how protein folding works. The answers to the world’s greatest challenges such as detecting an enzyme that can breakdown industrial waste and the development of treatments for multiple diseases lies in these proteins.

A Nobel Prize Winner in Chemistry, Christian Anfinsen in his acceptance speech in 1972 stated that the amino acid’s sequence in a protein should completely determine the protein structure. However, his hypothesis only triggered a quest that prolonged for five decades. More so, in theory, before getting to the final 3D structure, it is quite likely for a protein to fold itself a number of times. And this could take longer than anyone could ever know. Cyrus Levinthal, an American Molecular Biologist calculated 10^300 possible conformations. However, in nature, such proteins tend to fold spontaneously and some in milliseconds, often referred to as Levinthal’s paradox.

AlphaFold to the Rescue

The implementation of AI to tackle a 50-year-old protein folding challenge was a success. DeepMind, a British AI company used AlphaFold, a neural net to predict the 3D structure of proteins according to their amino acid sequence with full accuracy. Often referred to as the ‘protein folding problem’ DeepMind’s breakthrough could unlock new possibilities of biological research over a long period.

Results from the 14th Critical Assessment of protein Structure Prediction (CASP14) achieved unparalleled accuracy in predicting the protein structure. The assessment took place by blindly predicting the structure of proteins that have only been experimentally determined and those waiting for results. Also, referred to as the building blocks of life, these proteins consist of nearly 20 amino acids in multiple sequences and combinations. Not to mention, a protein’s biological function is closely tied to its 3D structure and the only way to solve this problem is by determining the final 3D structure of a protein.

Therefore, it is important to know the final shape of the protein to understand how a specific protein functions and the way they interact with other biomolecules. Will they be capable of modification or will they ever be controlled?

CASP used the “Global Distance Test (GDT)” metric ranging between 0-100 to assess the accuracy. This new AlphaFold system brought unprecedented results by demonstrating a median score of 92.4 GDT across all other targets. The average system error is merely 1.6 Angstroms i.e. only a width of an atom.

The way forward

In 2018, AlphaFold demonstrated an iteration of AI CASP13 resulting in the highest accuracy amongst other participants. The team at DeepMind trained their model to target shapes from scratch even without using any solved protein as their template.

However, in 2020, they introduced deep learning architectures into AI by using an attention-based model with end-to-end training. This system had the opportunity to be trained on public datasets consisting of nearly 170,000 experimental protein structures along with the database of the known structures. This system was different, it had the ability to translate the real world into data to develop an output that can be translated back to the real world. AlphaFold still needs to compete with biases while training but it has started using some other ways to deal with such biases.

“This is an incredible AI-powered breakthrough in protein folding, which will help us better understand one of life’s most fundamental building blocks. This huge leap forward from DeepMind has immediate practical implications, enabling researchers to tackle new and difficult problems, from future pandemic response to environmental sustainability,” says Sundar Pichai, CEO, Google, and Alphabet.