7 Ensemble random projection for large-scale predictions

Typically, GO-based methods extract as many as thousands of GO terms to formulate GO vectors. The curse of dimensionality severely restricts the predictive power of GO-based multi-label classification systems. Besides, high-dimensional feature vectors may contain redundant or irrelevant information, causing the classification systems suffer from overfitting. To address this problem, this chapter presents a dimensionality reduction method that applies random projection (RP) to construct an ensemble of multi-label classifiers. After feature extraction, the GO vectors are then projected onto lower-dimensional spaces by random projection matrices whose elements conform to a distribution with ...

Get Machine Learning for Protein Subcellular Localization Prediction now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.