3 years ago

Adaptive Scheduling Parallel Jobs with Dynamic Batching in Spark Streaming

Dazhao Cheng, Xiaobo Zhou, Yu Wang, Changjun Jiang,
Today enterprises have massive stream data that require to be processed in real time due to data explosion in recent years. Spark Streaming as an emerging system is developed to process real time stream data analytics by using micro-batch approach. The unified programming model of Spark Steaming leads to some unique benefits over other traditional streaming systems, such as fast recovery from failures, better load balancing and resource usage. It treats the continuous stream as a series of micro-batches of data and continuously process these micro-batch jobs. However, efficient scheduling of micro-batch jobs to achieve high throughput and low latency is very challenging due to the complex data dependency and dynamism inherent in streaming workloads. In this paper, we propose A-scheduler, an adaptive scheduling approach that dynamically schedules parallel micro-batch jobs in Spark Streaming and automatically adjusts scheduling parameters to improve performance and resource efficiency. Specifically, A-scheduler dynamically schedules multiple jobs concurrently using different policies based on their data dependencies and automatically adjusts the level of job parallelism and resource shares among jobs based on workload properties. Furthermore, we integrate dynamic batching technique with A-Scheduler to further improve the overall performance of the customized Spark Streaming system. It relies on an expert fuzzy control mechanism to dynamically adjust the length of each batch interval in response to time-varying streaming workload and system processing rate. We implemented A-scheduler and evaluated it with a real-time security event processing workload. Our experimental results show that A-scheduler with dynamic batching can reduce end-to-end latency by 38 percent and meanwhile improve workload throughput and energy efficiency by 23 and 15 percent, respectively, compared to the default Spark Streaming scheduler.
You might also like
Discover & Discuss Important Research

Keeping up-to-date with research can feel impossible, with papers being published faster than you'll ever be able to read them. That's where Researcher comes in: we're simplifying discovery and making important discussions happen. With over 19,000 sources, including peer-reviewed journals, preprints, blogs, universities, podcasts and Live events across 10 research areas, you'll never miss what's important to you. It's like social media, but better. Oh, and we should mention - it's free.

  • Download from Google Play
  • Download from App Store
  • Download from AppInChina

Researcher displays publicly available abstracts and doesn’t host any full article content. If the content is open access, we will direct clicks from the abstracts to the publisher website and display the PDF copy on our platform. Clicks to view the full text will be directed to the publisher website, where only users with subscriptions or access through their institution are able to view the full article.