In this talk, I will share some of the insights we gained from a large-scale comparison of binary classification algorithms. In our study, we evaluated some of the most widely used learning algorithms using a variety of metrics that emphasize different performance aspects: accuracy at a set threshold, ability to rank positive cases higher than negative ones, or ability to predict well-calibrated probabilities. Besides addressing the obvious questions that arise with such a comparison (is there a "best" learning algorithm? are SVMs better than neural networks? do newly developed algorithms like boosting and SVM really provide an improvement in practice?), I will discuss a few unexpected and, I would say, more exciting findings from the study. I will look in depth at the ability of learning algorithms to produce well-calibrated probabilistic predictions, I will address the question of what performance metric should be used for model selection, and, time permitting, I will show how performance can be improved by using Ensemble Selection to combine predictions from models trained by the different learning algorithms.
This is joint work with Rich Caruana, Art Munson, David Skalak,Tom Fawcett, Geoff Crew, Alex Ksikes and Cristi Bucila.