Predicting Cellular Protein localization Sites on Ecoli’s Minimal Dataset using a Comparison of Machine Learning Techniques
View/ Open
Date
2018-08Author
Yonasi, Safari
Nakasi, Rose
Singh, Yashik
Metadata
Show full item recordAbstract
Several Machine Learning Classification Techniques have been applied in predicting Protein Localization sites of E. coli using a number of techniques. However, research done is limited to no prediction of Localization sites of Proteins on Ecoli0s minimal dataset with the most informative features obtained using different feature selection techniques. This study investigated several Machine Learning Classification and Feature Selection Techniques as applied on Ecoli0s minimal dataset. The implementation of classifiers aided in predicting localization sites of E. coli0s minimal subset using its informative features obtained by feature selection techniques. Results were achieved in four parts including; (Data Collection, Cleaning and Preprocessing), Feature selection where the most informative features are selected, Classification where prediction of the localization of proteins is done and then Evaluation of the Classifiers to assess their performance
using a number of measures including Accuracy from Cross-validation, and AUROCC to enable in recommending the best Classifier at the end. Among the Classifiers used, Extra Tree Classifier and Gradient Boosting are seen to be the best at performance followed by Random forest as seen from Precision, Recall and F-measure scores. Ada Boost is the worst at 83%.
Collections
- Research Articles [42]