An Efficient Algorithm for Initializing Centroids in K-means Clustering


  • Dr. Ahmed Hussain Aliwy University of Kufa
  • Dr. Kadhim B. S. Aljanabi University of Kufa



Data Mining, Clustering, K-means, Centroids, Complexity


Clustering represents one of the most popular knowledge extraction algorithms in data mining techniques. Hierarchical and partitioning approaches are widely used in this field. Each has its own advantages, drawbacks and goals. K-means represents the most popular partitioning clusteringtechnique, however it suffers from two major drawbacks; time complexity and its sensitivity to the initial centroid values. The work in this paper presents an approach for estimating the starting initial centroids throughout three process including density based, normalization and smoothing ideas. The proposed algorithm has a strong mathematical foundation. The proposed approach was tested using a free standard data (20000 records). The results showed that the approach has better complexity and ensures the clustering convergence.


Download data is not yet available.


Jiawei Han and Micheline Kamber “ Data Mining: Concepts and Techniques” 3rd Edition., Morgan Kaufmann, 2010.

M. Steinbach, P.-N.Tan and V.Kumar, “Introduction to Data Mining”,Addison-Wesley, 2006. ISBN: 0-321-32136-7

M. H. Dunham, “Data Mining:Introductory and Advanced Topics”,Prentice Hall, 2002.

D. J. Hand, H. Mannila, and P. Smyth, “Principles of Data Mining”, MIT Press,2001.

A. K. Jain, M. N. Murty, P. J. Flynn, Data Clustering: A Review, ACM Computing Surveys 31 (3) (1999)264–323. DOI:

L. Kaufman, P. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, Wiley-Interscience,1990. DOI:

D. Aloise, A. Deshpande, P. Hansen, P. Popat, NP-Hardness of Euclidean Sum-of-Squares Clustering, MachineLearning 75 (2) (2009) 245–248. DOI:

A. K. Jain, Data Clustering: 50 Years Beyond K-means, Pattern Recognition Letters 31 (8) (2010) 651–666. DOI:

M. Emre Celebi,Hassan A. Kingravi and Patricio A. Velain in theirpaper “A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm”, Expert Systems with Applications, 40(1): 200–210, 2013 DOI:

J. M. Pena, J. A. Lozano, P. Larranaga, An Empirical Comparison of Four Initialization Methods for theK-Means Algorithm, Pattern Recognition Letters 20 (10) (1999) 1027–1040. DOI:

J. He, M. Lan, C. L. Tan, S. Y. Sung, H. B. Low, Initialization of Cluster Refinement Algorithms: A Reviewand Comparative Study, in: Proc. of the 2004 IEEE Int. Joint Conf. on Neural Networks, 2004, pp. 297–302.

[12] J. MacQueen, Some Methods for Classification and Analysis of Multivariate Observations, in: Proc. of the 5thBerkeley Symposium on Mathematical Statistics and Probability, 1967, pp. 281–297.

R. C. Jancey, Multidimensional Group Analysis, Australian Journal of Botany 14 (1) (1966) 127–130. DOI:

E. Forgy, Cluster Analysis of Multivariate Data: Efficiency vs. Interpretability of Classification, Biometrics 21(1965) 768.

T. Gonzalez, Clustering to Minimize the Maximum Intercluster Distance, Theoretical Computer Science 38 (2–3) (1985) 293–306. DOI:

J. A. Hartigan, M. A. Wong, Algorithm AS 136: A K-Means Clustering Algorithm, Journal of the RoyalStatistical Society C 28 (1) (1979) 100–108. DOI:




How to Cite

Aliwy, D. A. H., & Aljanabi, D. K. B. S. (2016). An Efficient Algorithm for Initializing Centroids in K-means Clustering. Journal of Kufa for Mathematics and Computer, 3(2), 18–24.

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.