7Encoder-Decoder Models for Protein Secondary Structure Prediction

Ashish Kumar Sharma and Rajeev Srivastava

Department of Computer Science and Engineering, Indian Institute of Technology (BHU), Varanasi, Uttar Pradesh, India

Abstract

Proteins are arranged in a linear sequence due to peptide bonds. In proteins, a peptide bond combines the amino group of one protein with the carboxyl group of another protein. Protein secondary structure formation results from their biophysical and biochemical properties, like natural languages which depend on their grammatical rule. So, the proposed model predicts a secondary structure from protein primary sequences using the encoder-decoder based machine translation method. The proposed model uses an encoder-decoder model based on long short-term memory network. The proposed work uses training and testing performed on available public datasets, namely CullPDB and data1199. The proposed model has better Q3 accuracy of 84.87% and 87.39% for CullPDB and data1199, respectively. Further, the proposed work was evaluated by comparing their performance with other methods which predict secondary structure only from a single sequence. The Encoder-Decoder Model for predicting secondary structure from a single primary sequence is performing better than other single sequence-based methods.

Keywords: Protein structure prediction, amino acids sequence, proteomics, one hot encoding, encoder-decoder, long short-term memory

7.1 Introduction

Protein is important ...

Get Mathematics and Computer Science, Volume 2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.