Abstract:
Aiming at the problems of low computational efficiency and low clustering performance of clustering algorithms for big data, a clustering algorithm of big data based on the improved artificial bee colony algorithm and MapReduce is presented. The grey wolf optimizer algorithm and artificial bee colony algorithm are combined, in order to improve the exploration and exploitation of the artificial bee colony algorithm simultaneously, this strategy helps to improve the clustering performance effectively. The chaotic map and backward learning are utilized as the initial strategy of ABC colony to improve the solution quality of search procedure. The clustering algorithm is realized based on MapReduce programming model, and the clustering process for big data is realized by minimizing the quadratic sum of inner class distances. Experimental results demonstrated that the proposed algorithm improves the clustering quality of big data, and it speedups the clustering procedure.