Abstract

Promoter Prediction in Bacterial DNA Sequences Using Expectation Maximization and Support Vector Machine Learning Approach

Ahmad Maleki*, Vahid Vaezinia and Ayda Fekri

Promoter is a part of the DNA sequence that comes before the gene and is key as a regulator of genes. Promoter prediction helps determine gene position and analyze gene expression. Hence, it is of great importance in the field of bioinformatics. In bioinformatics research, a number of machine learning approaches are applied to discover new meaningful knowledge from biological databases. In this study, two learning approaches, expectation maximization clustering and support vector machine classifier (EMSVM) are used to perform promoter detection. Expectation maximization (EM) algorithm is used to identify groups of samples that behave similarly and dissimilarly, such as the activity of promoters and non-promoters in the first stage, while the support vector machine (SVM) is used in the second stage to classify all the data into the correct class category. We have applied this method to datasets corresponding to σ24, σ32, σ38, σ70 promoters and its effectiveness was demonstrated on a range of different promoter regions. Furthermore, it was compared with other classification algorithms to indicate the appropriate performance of the proposed algorithm. Test results show that EMSVM performs better than other methods.