The Heterogeneous Ensembles of Standard Classification Algorithms (HESCA): the Whole is Greater than the Sum of its Parts.
Building classification models is an intrinsically practical exercise that requires many design decisions prior to deployment. We aim to provide some guidance in this decision making process. Specifically, given a classification problem with real valued attributes, we consider which classifier or family of classifiers should one use. Strong contenders are tree based homogeneous ensembles, support vector machines or deep neural networks. All three families of model could claim to be state-of-the-art, and yet it is not clear when one is preferable to the others. Our extensive experiments with over 200 data sets from two distinct archives demonstrate that, rather than choose a single family and expend computing resources on optimising that model, it is significantly better to build simpler versions of classifiers from each family and ensemble. We show that the Heterogeneous Ensembles of Standard Classification Algorithms (HESCA), which ensembles based on error estimates formed on the train data, is significantly better (in terms of error, balanced error, negative log likelihood and area under the ROC curve) than its individual components, picking the component that is best on train data, and a support vector machine tuned over 1089 different parameter configurations. We demonstrate HESCA+, which contains a deep neural network, a support vector machine and two decision tree forests, is significantly better than its components, picking the best component, and HESCA. We analyse the results further and find that HESCA and HESCA+ are of particular value when the train set size is relatively small and the problem has multiple classes. HESCA is a fast approach that is, on average, as good as state-of-the-art classifiers, whereas HESCA+ is significantly better than average and represents a strong benchmark for future research.
Publisher URL: http://arxiv.org/abs/1710.09220
DOI: arXiv:1710.09220v1
Keeping up-to-date with research can feel impossible, with papers being published faster than you'll ever be able to read them. That's where Researcher comes in: we're simplifying discovery and making important discussions happen. With over 19,000 sources, including peer-reviewed journals, preprints, blogs, universities, podcasts and Live events across 10 research areas, you'll never miss what's important to you. It's like social media, but better. Oh, and we should mention - it's free.
Researcher displays publicly available abstracts and doesn’t host any full article content. If the content is open access, we will direct clicks from the abstracts to the publisher website and display the PDF copy on our platform. Clicks to view the full text will be directed to the publisher website, where only users with subscriptions or access through their institution are able to view the full article.