An Approach for Solving Missing Values in Data Set Using Clustering-Curve Fitting Technique

Authors

  • Kadhim AlJanabi AlJanabi University of Kufa
  • Mansoor Habeebi Habeebi University of Kufa
  • Nawras Riyadh Neamah University of Kufa

DOI:

https://doi.org/10.31642/JoKMC/2018/0202012

Keywords:

Data Mining, Missing Values, Clustering, Curve Fitting.

Abstract

Missing values in data sets represent one of the greatest challenge in analyzing data to extract knowledge from the data set. The work in this paper presents a new approach for solving the missing values problems by using and merging two different techniques; clustering (K-means and Expectation Maximization) and curve fitting. More than twenty thousand records of real health data set collected from different Iraqi hospitals were used to create and test the proposed approach that showed better results than the most popular techniques for estimation missing values such as most common values, overall overage, class average, and class most common values. Different software were used in the proposed work including WEKA (Waikato Environment for Knowledge Analysis), Matlab, Excel and C++.

Downloads

Download data is not yet available.

References

Jiawei Han and Micheline Kamber," Data Mining: Concepts and Techniques", Second Edition, Morgan Kaufmann, 2006.

Jiawei Han, Micheline Kamber and Jian Pei, " Data Mining: Concepts and Techniques" 3rd Edition, Morgan Kaufmann, 2012. [3] Wei, W. --- Tang, Y., " A generic neural network approach for filling missing data in data mining", Systems, Man andCybernetics, 2003, IEEE International Conference on ISBN: 0780379527.

Luai Al Shalabi, Mohannad Najjar and Ahmad Al Kayed, "A framework to Deal with Missing Data in Data Sets",Journal of Computer Science 2 (9): 740-745, 2006, ISSN. DOI: https://doi.org/10.3844/jcssp.2006.740.745

Kadhim B. Swadi Aljanabi, " An Improved Algorithm for Data Preprocessing in Mining Crime Data Set", Journal of Kufa for Mathematics and Computer, V o l . 1, N o . 4, N o v ., 2 011, pp.81- 87.

David Hand, Heikki Mannila and Padhraic Smyth," Principles of Data Mining", MIT Press, 2001.

Dr. Bushra M. Hussan, " Data Mining based Prediction of Medical data Using K-means algorithm", Basrah Journal of Science (A), Vol.30(1), 46-56 2012.

Joe D. Hoffman, "Numerical Methods for Engineers and Scientists", Second Edition,, NEW YORK. BASEL, 1992, ISBN: 0-8247-0443-6.

Downloads

Published

2014-12-01

How to Cite

AlJanabi, K. A., Habeebi, M. H., & Neamah, N. R. (2014). An Approach for Solving Missing Values in Data Set Using Clustering-Curve Fitting Technique. Journal of Kufa for Mathematics and Computer, 2(2), 87–105. https://doi.org/10.31642/JoKMC/2018/0202012

Similar Articles

1 2 3 4 5 6 > >> 

You may also start an advanced similarity search for this article.