Comparative study of parallelism on data mining

Publications

Comparative study of parallelism on data mining

Year : 2017

Publisher : Springer Verlagservice@springer.de

Source Title : Advances in Intelligent Systems and Computing

Document Type :

Abstract

Today’s world has seen a massive explosion in various kinds of data having some unique characteristics such as high-dimensionality and heterogeneity. The need of automated data driven techniques has become a necessity to extract useful information from this huge and diverse data sets. Data mining is an important step in the process of knowledge discovery in databases (KDD) and focuses on discovering hidden information in data that go beyond simple analysis. Traditional data mining methods are often found inefficient and unsuitable in analyzing today’s data sets due to their heterogeneity, massive size and high-dimensionality. So, the need of parallelization of traditional data mining algorithms has almost become inevitable but challenging considering available hardware and software solutions. The main objective of this paper is to look at the need and limitations of parallelization of data mining algorithms and findingways to achieve the best. In this comparative study, we took a look at different parallel computer architectures, well proven parallelization methods, and programming language of choice.