Enhancing Hadoop Performance in Homogenous and Heterogeneous Big Data Environments by Dynamic Slot Configuration

  • Ekhlas Kadhum Hamza University of Technology
Keywords: Hadoop, MapReduce, big data, slot configuration, scheduling algorithms, MakeSpan, resources utilization

Abstract

Hadoop is one of the most famous platform solutions for processing large volume and scale of data in parallel processing in Cloud computing. A Hadoop system can be characterized based on three main factors: cluster, workload and user. Each of these factors can be described as either heterogeneous or homogenous, which reflects the heterogeneity degree of the Hadoop systemThe objective of this proposed research work is to investigate the degree of influence of heterogeneity for each of these factors on the performance of Hadoop based on different schedulers. Three schedulers are considered with different levels of Hadoop heterogeneity and are tested and analyzed: the first algorithm considered is the FIFO (First in First out), the second is the Fair sharing, and the final is the COSHH (Classification and Optimization based Scheduler for Heterogeneous Hadoop). Performance issues are related to Hadoop schedulers and comparative performance analysis between different cases of jobs submission. These jobs are processed in different homogenous or heterogeneous data environments and under fixed or reconfigurable slot between map and reduce tasks for Hadoop MapReduce java programming clustering model. The results showed that when assigning tunable knob between map and reduce tasks under certain schedulers like FIFO algorithm, the performance enhanced significantly especially in cases of heterogeneity environment where the workload decreased significantly and the utilization of computational resources increase was obvious.

Published
2020-03-31