Intelligent data compression policy for hadoop performance optimization

Publications

Intelligent data compression policy for hadoop performance optimization

Author : Dr Ashu Abdul

Year : 2021

Publisher : Springer

Source Title : Advances in Intelligent Systems and Computing

Document Type : Conference paper

Link

DOI

https://doi.org/10.1007/978-3-030-49345-5_9

Web

https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85089719180&origin=inward

Abstract

Hadoop can deal with Zeta-level data, but the huge request for Disk I/O and Network utilization often appears as the limitations in Hadoop. During different job execution phases of Hadoop, the production of intermediate data is enormous, and transferring the same data over the network to the “reduce” process becomes an overload. In this paper, we discuss an intelligent data compression policy to overcome these limitations and to improve the performance of Hadoop. An intelligent compression policy is devised that starts compression at an apt time when all the map tasks are not completed in the job. This policy reduces the data transfer time in a network. The results are evaluated by running several benchmarks, which shows an improvement of about 8–15% during job execution and depicts the merits of the proposed compression policy.