Abstract
Hadoop can deal with Zeta-level data, but the huge request for Disk I/O and Network utilization often appears as the limitations in Hadoop. During different job execution phases of Hadoop, the production of intermediate data is enormous, and transferring the same data over the network to the “reduce” process becomes an overload. In this paper, we discuss an intelligent data compression policy to overcome these limitations and to improve the performance of Hadoop. An intelligent compression policy is devised that starts compression at an apt time when all the map tasks are not completed in the job. This policy reduces the data transfer time in a network. The results are evaluated by running several benchmarks, which shows an improvement of about 8–15% during job execution and depicts the merits of the proposed compression policy.