Indian Society of Genetics & Plant Breeding


Performance evaluation of neural network, support vector machine and random forest for prediction of donor splice sites in rice

Published on

Prediction of splice sites plays an important role in predicting
the gene structure. Rice being one of the major cereal crops,
continuous improvement is possible with the prediction of
unknown genes associated with complex traits. Machine
learning techniques i.e., Artificial Neural Network (ANN)
and Support Vector Machine (SVM) have been successfully
used for the prediction of splice sites but comparison of
their performance has not been made yet to our limited
knowledge. Further, Random Forest (RF), another machine
learning method, has been successfully used and reported
to outperform ANN and SVM in areas other than splice site
prediction. In this study we have developed an approach to
encode the splice site sequence data of rice into numeric
form that are subsequently used as input in ANN, SVM and
RF for prediction of donor splice sites. The performances
were then evaluated and compared using receiving
operating characteristics (ROC) curve and estimate of area
under ROC curve (AUC), averaged over 5-fold cross
validation. The result reveals that AUC of RF is higher than
ANN and SVM which implies that it can be preferred over
SVM and ANN in the prediction splice sites.

Keywords: Gene structure, splice site, machine learning, rice


Year: 2016
Volume: 76
Issue: 2
Article DOI: 10.5958/0975-6906.2016.00027.4
Print ISSN: 0019-5200
Online ISSN: 0975-6906


Tanmaya Kumar Sahu info_circle
A. R. Rao info_circle

Download PDF