AlphaFold2 and other deep learning-based approaches have provided a breakthrough advance in the field of protein structure prediction. Inspired by this, scientists all over the world have been extending the idea to even more challenging areas in structure prediction, such as multimeric protein complexes and RNA structures. One such structure prediction challenge is that of orphan proteins, or proteins that have no close homologs. An important step within AlphaFold2 is the use of coevolution signals derived from a multiple sequence alignment. This alignment identifies homologous — similar but not identical — sequences for the protein of interest. This step, however, is not feasible for orphan proteins, and current approaches fail to predict accurate structures for such protein and their complexes.
Researchers from Nankai University and Shandong University in China, led by Jianyi Yang, have developed trRosettaX-Single, a single-sequence protein structure prediction method that shows better performance on orphan proteins than AlphaFold2 and RoseTTAFold. “A pretrained language model based on supervised learning (s-ESM-1b) is first employed in trRosettaX-Single to encode the sequence as an embedding vector. This vector is then fed into a multiscale residual network to predict inter-residue 2D geometry, including distance and orientations. Finally, energy minimization is adopted to generate 3D structure models from the predicted 2D geometry,” explains Yang. A few other new training strategies, including a multiscale residual network, sequence mask prediction and knowledge distillation, also contribute to success of trRosettaX-Single, he adds.
This is a preview of subscription content, access via your institution