5 years ago

DS-MLR: Exploiting Double Separability for Scaling up Distributed Multinomial Logistic Regression.

S.V.N. Vishwanathan, Xinhua Zhang, Hyokun Yun, Sriram Srinivasan, Shin Matsushima, Parameswaran Raman

Scaling multinomial logistic regression to datasets with very large number of data points and classes has not been trivial. This is primarily because one needs to compute the log-partition function on every data point. This makes distributing the computation hard. In this paper, we present a distributed stochastic gradient descent based optimization method (DS-MLR) for scaling up multinomial logistic regression problems to massive scale datasets without hitting any storage constraints on the data and model parameters. Our algorithm exploits double-separability, an attractive property we observe in the objective functions of several models in machine learning, that allows us to achieve both data as well as model parallelism simultaneously. In addition to being parallelizable, our algorithm can also easily be made non-blocking and asynchronous. We demonstrate the effectiveness of DS-MLR empirically on several real-world datasets, the largest being a reddit dataset created out of 1.7 billion user comments, where the data and parameter sizes are 228 GB and 358 GB respectively.

Publisher URL: http://arxiv.org/abs/1604.04706

DOI: arXiv:1604.04706v4

You might also like
Discover & Discuss Important Research

Keeping up-to-date with research can feel impossible, with papers being published faster than you'll ever be able to read them. That's where Researcher comes in: we're simplifying discovery and making important discussions happen. With over 19,000 sources, including peer-reviewed journals, preprints, blogs, universities, podcasts and Live events across 10 research areas, you'll never miss what's important to you. It's like social media, but better. Oh, and we should mention - it's free.

  • Download from Google Play
  • Download from App Store
  • Download from AppInChina

Researcher displays publicly available abstracts and doesn’t host any full article content. If the content is open access, we will direct clicks from the abstracts to the publisher website and display the PDF copy on our platform. Clicks to view the full text will be directed to the publisher website, where only users with subscriptions or access through their institution are able to view the full article.