On the optimization of Hadoop MapReduce default job scheduling through dynamic job prioritization

Document Type : Research Paper


1 Department of Computer Engineering and Information Technology, Faculty of Engineering, University of Qom, Qom, Iran

2 Department of Algorithms and Computation, School of Engineering Science, College of Engineering, University of Tehran


One of the most popular frameworks for big data processing is Apache Hadoop MapReduce. The default Hadoop scheduler uses queue system. However, it does not consider any specific priority for the jobs required for MapReduce programming model. In this paper, a new dynamic score is developed to improve the performance of the default Hadoop MapReduce scheduler. This dynamic priority score is computed based on effective factors such as job runtime estimation, input data size, waiting time, and length or bustle of the waiting queue. The implementation of the proposed scheduling method, based on this dynamic score, not only improves CPU and memory performance, but also reduced waiting time and average turnaround time by approximately $45\%$ and $40\%$ respectively, compared to the default Hadoop scheduler.