3 years ago

Feature selection based on feature interactions with application to text categorization

Xiaochuan Tang, Yuanshun Dai, Yanping Xiang

Publication date: Available online 10 November 2018

Source: Expert Systems with Applications

Author(s): Xiaochuan Tang, Yuanshun Dai, Yanping Xiang

Abstract

Feature selection is an import preprocessing approach for machine learning and text mining. It reduces the dimensions of high-dimensional data. A popular approach is based on information theoretic measures. Most of the existing methods used two- and three-dimensional mutual information terms that are ineffective in detecting higher-order feature interactions. To fill this gap, we employ two- through five-way interactions for feature selection. We first identify a relaxed assumption to decompose the mutual information-based feature selection problem into a sum of low-order interactions. A direct calculation of the decomposed interaction terms is computationally expensive. We employ five-dimensional joint mutual information, a computationally efficient measure, to estimate the interaction terms. We use the ‘maximum of the minimum’ nonlinear approach to avoid the overestimation of the feature significance. We also apply the proposed method to text categorization. To evaluate the performance of the proposed method, we compare it with eleven popular feature selection methods, eighteen benchmark data and seven text categorization data. Experimental results with four different types of classifiers provide concrete evidence that higher-order interactions are effective in improving feature selection methods.

You might also like
Discover & Discuss Important Research

Keeping up-to-date with research can feel impossible, with papers being published faster than you'll ever be able to read them. That's where Researcher comes in: we're simplifying discovery and making important discussions happen. With over 19,000 sources, including peer-reviewed journals, preprints, blogs, universities, podcasts and Live events across 10 research areas, you'll never miss what's important to you. It's like social media, but better. Oh, and we should mention - it's free.

  • Download from Google Play
  • Download from App Store
  • Download from AppInChina

Researcher displays publicly available abstracts and doesn’t host any full article content. If the content is open access, we will direct clicks from the abstracts to the publisher website and display the PDF copy on our platform. Clicks to view the full text will be directed to the publisher website, where only users with subscriptions or access through their institution are able to view the full article.