A Fast and Generic GPU-Based Parallel Reduction Implementation.
Reduction operations are extensively employed in many computational problems. A reduction consists of, given a finite set of numeric elements, combining into a single value all elements in that set, using for this a combiner function. A parallel reduction, in turn, is the reduction operation concurrently performed when multiple execution units are available. The current work reports an investigation on this subject and depicts a GPU-based parallel approach for it. Employing techniques like Loop Unrolling, Persistent Threads and Algebraic Expressions to avoid thread divergence, the presented approach was able to achieve a 2.8x speedup when compared to the work of Catanzaro, using a generic, simple and easily portable code. Experiments conducted to evaluate the approach show that the strategy is able to perform efficiently in AMD and NVidia's hardware, as well as in OpenCL and CUDA.
Publisher URL: http://arxiv.org/abs/1710.07358
Keeping up-to-date with research can feel impossible, with papers being published faster than you'll ever be able to read them. That's where Researcher comes in: we're simplifying discovery and making important discussions happen. With over 19,000 sources, including peer-reviewed journals, preprints, blogs, universities, podcasts and Live events across 10 research areas, you'll never miss what's important to you. It's like social media, but better. Oh, and we should mention - it's free.
Researcher displays publicly available abstracts and doesn’t host any full article content. If the content is open access, we will direct clicks from the abstracts to the publisher website and display the PDF copy on our platform. Clicks to view the full text will be directed to the publisher website, where only users with subscriptions or access through their institution are able to view the full article.