A Hybrid Data Warehouse Model to Improve Mining Algorithms

kadhim B.S. AlJanabi; Rusul Kadhim Meshjal

doi:10.31642/JoKMC/2018/040304

Authors

kadhim B.S. AlJanabi University of Kufa
Rusul Kadhim Meshjal University of Kufa

DOI:

https://doi.org/10.31642/JoKMC/2018/040304

Keywords:

Data Warehouse, Data Cube, Data Mining, Summarization, Data Reduction

Abstract

The performance of different Data Mining Algorithms including Classification, Clustering, Association, Prediction and others are highly related to the approaches used in Data Warehouse design and to the way the data is stored (lightly summarized, highly summarized and detailed).Detailed data is important to get detailed reports but as the amount of data is huge this represents a big challenge to the mining algorithms, on the other hand, the summarized data leads to better algorithms performance but the lack of the required knowledge may affect the overall mining process. Knowledge extraction and mining algorithms performance and complexities represent a big challenge in data analysis field, hence the work in this paper represents a proposed approach to improve the algorithms performance throughout well designed warehouse and data reduction technique. The work in this paper presents a hybrid warehouse galaxy model that stores data in three different formats including detailed, summarized and highly summarized data. The time and space complexity are the major criteria in the proposed approach. Real data was collected about schools, students and teachers from different AlNajaf AlAshraf cities, the data was preprocessed, reduced mainly through concept hierarchy and then converted into dimensions and fact tables (Warehouse Galaxy Model) which in turn are converted into multidimensional cubes. Roll up and drill down queries were highly used to get the required information. The resultant data cubes and in turn the corresponding warehouse model presented in this work showed a reasonable improvement in knowledge extraction algorithms for the data under discussion. The results of the queries showed better performance of different roll up and drill down queries compared to detailed data queries

Downloads

Download data is not yet available.

References

Jiawei Han and M. Kamber “Data Mining: Concepts and Techniques” 3rd Edition.,

Morgan Kaufmann, 2010.

M. Steinbach, P.-N.Tan and V. Kumar, Introduction to Data Mining, Addison-Wesley,

ISBN: 0-321-32136-7.

M. H. Dunham, Data Mining: Introductory and Advanced Topics, Prentice Hall, 2002.

D. J. Hand, H. Mannila, and P. Smyth, Principles of Data Mining, MIT Press, 2001.

I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques

with Java Implementations, Morgan Kaufmann, 2nd ed., 2005, ISBN 0-12-088407-0

Hong Cheng, Xifeng Yan, Jiawei Han, and Chih-Wei Hsu, “Discriminative Frequent

Pattern Analysis for Effective Classification”, in Proc. 2007 Int. Conf. on Data Engineering

(ICDE'07), Istanbul, Turkey, April 2007.

J. Han, J. Pei, Y. Yin and R. Mao, “Mining Frequent Patterns without Candidate

Generation: A Frequent-Pattern Tree Approach”, Data Mining and Knowledge Discovery,

(1):53-87, 2004.

X. Wu · Vipin Kumar · J. Ross Quinlan · J. Ghosh · Q. Yang · Hiroshi. “Top 10 Algorithms in Data Mining”, (2008) 14:1–37 DOI 10.1007/s10115-007-0114-2, © Springer-Verlag DOI: https://doi.org/10.1007/s10115-007-0114-2

London Limited 2007.

Venky H., Anand R., Jeffrey D. U1lman “Implementing Data Cubes

Efficiently”, SIGMOD ’96 6/96 Montreal, Canada @ 1996 ACM 0-89791 -794-

Edward H., David W. C., Ben K., “Optimization in Data Cube System Design”, Journal of Intelligent Information Systems, 23:1, 17–45, 2004 c 2004,Kluwer Academic Publishers. Printed in The United States DOI: https://doi.org/10.1023/B:JIIS.0000029669.16825.54