BMI Students

Saturday, May 20, 2006

Notes on classifiers

I have been testing a bunch of classifiers for a project I am doing. The objective is to classify an intergenic region as ACE1 (or any motif) or not-ACE1, based on features of the intergenic regions. I did this for a number of sets of features, and the results were very consistent. I have done enough tests that I feel comfortable relating some general conclusions....

Random Forests and SVMs always won, with random forests usually commanding a slight lead. SVMs with a polynomial kernel did a bit worse. MaxEnt usually came fourth, and seemed to do better on discrete data (hence the NLP slant of this method). Finally, k-nearest neighbors always lost. Random Forests were slower than SVMs, apart from that I think they are preferable.

Random forests are just collections of voting decision trees, each trained on bootstrapped data and variables. Someone must have done the same thing for collections of voting SVMs. If I find it I'll add it to this post. Seems like it must win overall.


Post a Comment

<< Home