An Efficient Algorithm for Initializing Centroids in K-means Clustering

Dr. Ahmed Hussain Aliwy; Dr. Kadhim B. S. Aljanabi

doi:10.31642/JoKMC/2018/030203

Authors

Dr. Ahmed Hussain Aliwy University of Kufa
Dr. Kadhim B. S. Aljanabi University of Kufa

DOI:

https://doi.org/10.31642/JoKMC/2018/030203

Keywords:

Data Mining, Clustering, K-means, Centroids, Complexity

Abstract

Clustering represents one of the most popular knowledge extraction algorithms in data mining techniques. Hierarchical and partitioning approaches are widely used in this field. Each has its own advantages, drawbacks and goals. K-means represents the most popular partitioning clusteringtechnique, however it suffers from two major drawbacks; time complexity and its sensitivity to the initial centroid values. The work in this paper presents an approach for estimating the starting initial centroids throughout three process including density based, normalization and smoothing ideas. The proposed algorithm has a strong mathematical foundation. The proposed approach was tested using a free standard data (20000 records). The results showed that the approach has better complexity and ensures the clustering convergence.

Downloads

Download data is not yet available.

References

Jiawei Han and Micheline Kamber “ Data Mining: Concepts and Techniques” 3rd Edition., Morgan Kaufmann, 2010.

M. Steinbach, P.-N.Tan and V.Kumar, “Introduction to Data Mining”,Addison-Wesley, 2006. ISBN: 0-321-32136-7

M. H. Dunham, “Data Mining:Introductory and Advanced Topics”,Prentice Hall, 2002.

D. J. Hand, H. Mannila, and P. Smyth, “Principles of Data Mining”, MIT Press,2001.

A. K. Jain, M. N. Murty, P. J. Flynn, Data Clustering: A Review, ACM Computing Surveys 31 (3) (1999)264–323. DOI: https://doi.org/10.1145/331499.331504

L. Kaufman, P. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, Wiley-Interscience,1990. DOI: https://doi.org/10.1002/9780470316801

D. Aloise, A. Deshpande, P. Hansen, P. Popat, NP-Hardness of Euclidean Sum-of-Squares Clustering, MachineLearning 75 (2) (2009) 245–248. DOI: https://doi.org/10.1007/s10994-009-5103-0

A. K. Jain, Data Clustering: 50 Years Beyond K-means, Pattern Recognition Letters 31 (8) (2010) 651–666. DOI: https://doi.org/10.1016/j.patrec.2009.09.011

M. Emre Celebi,Hassan A. Kingravi and Patricio A. Velain in theirpaper “A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm”, Expert Systems with Applications, 40(1): 200–210, 2013 DOI: https://doi.org/10.1016/j.eswa.2012.07.021

J. M. Pena, J. A. Lozano, P. Larranaga, An Empirical Comparison of Four Initialization Methods for theK-Means Algorithm, Pattern Recognition Letters 20 (10) (1999) 1027–1040. DOI: https://doi.org/10.1016/S0167-8655(99)00069-0

J. He, M. Lan, C. L. Tan, S. Y. Sung, H. B. Low, Initialization of Cluster Refinement Algorithms: A Reviewand Comparative Study, in: Proc. of the 2004 IEEE Int. Joint Conf. on Neural Networks, 2004, pp. 297–302.

[12] J. MacQueen, Some Methods for Classification and Analysis of Multivariate Observations, in: Proc. of the 5thBerkeley Symposium on Mathematical Statistics and Probability, 1967, pp. 281–297.

R. C. Jancey, Multidimensional Group Analysis, Australian Journal of Botany 14 (1) (1966) 127–130. DOI: https://doi.org/10.1071/BT9660127

E. Forgy, Cluster Analysis of Multivariate Data: Efficiency vs. Interpretability of Classification, Biometrics 21(1965) 768.

T. Gonzalez, Clustering to Minimize the Maximum Intercluster Distance, Theoretical Computer Science 38 (2–3) (1985) 293–306. DOI: https://doi.org/10.1016/0304-3975(85)90224-5

J. A. Hartigan, M. A. Wong, Algorithm AS 136: A K-Means Clustering Algorithm, Journal of the RoyalStatistical Society C 28 (1) (1979) 100–108. DOI: https://doi.org/10.2307/2346830

An Efficient Algorithm for Initializing Centroids in K-means Clustering

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Make a Submission

Additional Menu

INDEXING & ABSTRACTING

Keywords

International Visitors

Browse

Information

Similar Articles

A Hybrid Data Warehouse Model to Improve Mining Algorithms

Data Reduction Techniques: A Comparative Study

A NewFramework for Analyzing and Mining Medical Data with aProposed Data Warehouse Model

A Proposed Framework for Analyzing Crime Data Set Using Decision Tree and Simple K-Means Mining Algorithms

Application of Decision Tree as a Data Mining Tool in a health care

Predicting Class Label Using Clustering-Classification Technique: A Comparative Study

Study of Efficient Cloud Storage Architectures for the Security Environment

An Approach for Solving Missing Values in Data Set Using Clustering-Curve Fitting Technique

A Genetic Algorithm Based Clustering Optimization A Survey

Predicting Students' Performance by Using Data Mining Methods