Data Reduction Techniques: A Comparative Study
DOI:
https://doi.org/10.31642/JoKMC/2018/090201Keywords:
Data Mining, Data Preprocessing, Data Reduction, Dimensionality ReductionAbstract
Data preprocessing in general and data reduction in specific represent the main steps in data mining techniques and algorithms since data in real world due to its vastness, the analysis will take a long time to complete .Almost all mining techniques including classification, clustering, association and others have high time and space complexities due to the huge amount of data and the algorithm behavior itself. That is the reason why data reduction represent an important phase in Knowledge Discovery in Databases (KDD) process. Many researchers introduced important solutions in this field. The study in this paper represents a comparative study for about 22 research papers in data reduction fields that covers different data reduction techniques such as dimensionality reduction, numerisoty reduction, sampling, clustering data cube aggregation and other techniques. From the conducted study, it can be concluded that the appropriate technique that can be used in data reduction is highly dependent on the data type, the dataset size, the application goal, the availability of noise and outliers and the compromise between the reduced data and the knowledge required from the analysis
Downloads
References
Agarwal,Shivam."Datamining:Dataminingconceptsandtechniques."2013internationalconferenceonmachineintelligenceandresearchadvancement.IEEE,2013.
Padhy,Neelamadhab,DrMishra,andRasmitaPanigrahi."The survey of data mining applications and featurescope." arXivpreprintarXiv:1211.5723(2012).
Hartama, Dedy, Agus Perdana Windarto, and Anjar Wanto."The Application of Data Mining in Determining PatternsofInterestofHighSchoolGraduates."JournalofPhysics:Conference Series. Vol. 1339. No. 1. IOP Publishing,2019. DOI: https://doi.org/10.1088/1742-6596/1339/1/012042
Gürbüz, Feyza, Lale Özbakir, and Hüseyin Yapici. "Datamining and preprocessing application on componentreportsofanairlinecompanyinTurkey."ExpertSystemswith Applications38.6(2011):6618-6626.García,Salvador,etal."Bigdatapreprocessing:methodsand prospects."BigDataAnalytics 1.1(2016):1-22. DOI: https://doi.org/10.1016/j.eswa.2010.11.076
Cano,JoséRamón,FranciscoHerrera,andManuelLozano."Usingevolutionaryalgorithmsasinstanceselectionfordata reductioninKDD:Anexperimentalstudy."IEEEtransactionsonevolutionaryomputation7.6(2003):561-575. DOI: https://doi.org/10.1109/TEVC.2003.819265
Singhal, Swasti, and Monika Jena. "A study on WEKAtool for data preprocessing, classification andclustering. "InternationalJournalofInnovativetechnologyandexploringengineering(IJItee)2.6(2013):250-253
Bania, R. K. "Survey on feature selection for datareduction."International Journal of Computer Applications 94.18(2014). DOI: https://doi.org/10.5120/16456-2390
Benjelloun, Fatima-Zahra, Ayoub Ait Lahcen, and SamirBelfkih. "An overview of big data opportunities, applicationsandtools." 2015IntelligentSystemsandComputer Vision(ISCV)(2015):1-6. DOI: https://doi.org/10.1109/ISACV.2015.7105553
Chen,CLPhilip,andChun-YangZhang."Data-intensiveapplications, challenges, techniques and technologies:A survey on Big Data." Information sciences 275(2014):314-347. DOI: https://doi.org/10.1016/j.ins.2014.01.015
Bokaba,Tebogo,WesleyDoorsamy,andBabuSenaPaul."Comparative study of machine learning classifiers formodellingroad traffic accidents."AppliedSciences12.2 (2022):828. DOI: https://doi.org/10.3390/app12020828
https://www.linkedin.com/pulse/what-dimensionality-reduction-algorithm-ml-how-we-bhattacharjee
https://clauswilke.com/dataviz/histograms-density-plots.html.
https://www.javatpoint.com/data-warehouse-what-is-data-cube.
El-Hasnony, Ibrahim M., Hazem M. El Bakry, and AhmedA.Saleh. "Comparative study among datareduction techniques over classificationaccuracy." International Journal of ComputerApplications122.2(2015). DOI: https://doi.org/10.5120/21671-4752
Georgescu, Ramona, et al. "Comparison of data reductiontechniques based on the performance of SVM-typeclassifiers. " 2010 IEEE Aerospace Conference. IEEE,2010. DOI: https://doi.org/10.1109/AERO.2010.5446692
Alamro,Reham,andAbdouYoussef."Impactofdatareductiontechniquesonclassification."2018InternationalConferenceonComputationalScienceandComputationalIntelligence(CSCI).IEEE,2018 DOI: https://doi.org/10.1109/CSCI46756.2018.00208
Downloads
Published
How to Cite
Issue
Section
Categories
License
Copyright (c) 2022 Ahmed Reyd AlKarawi, Kadhim B. S. AlJanabi
This work is licensed under a Creative Commons Attribution 4.0 International License.
which allows users to copy, create extracts, abstracts, and new works from the Article, alter and revise the Article, and make commercial use of the Article (including reuse and/or resale of the Article by commercial entities), provided the user gives appropriate credit (with a link to the formal publication through the relevant DOI), provides a link to the license, indicates if changes were made and the licensor is not represented as endorsing the use made of the work.