Identify Best Learning Method for Heart Diseases Prediction Under impact of Different Datasets Characteristics


  • Zahraa chaffat Oleiwi university of Kufa
  • Ebtesam N. AlShemmary
  • Salam Al-augby



Cardiovascular diseases, Deep learning, Data sparsity, Machine learning, Random Forest


This paper introduces an experimental study of the heart disease datasets characteristics impact on the performance of classification algorithms in the aim of identifying the best algorithm for each dataset under its characteristics. The performance of five machine learning algorithms (logistic regression (LR), K-Nearest Neighbor (KNN), Decision tree (DT), Random Forest (RF), and support vector machine (SVM)), single layer neural network (ANN), and deep neural network (DNN), has been evaluated using five heart disease datasets under four data complexity measurement: number of samples (dataset size), number of features (dimension of dataset), Data sparsity measures, and correlation of features. All datasets have been processed and normalized then the mutual information-based feature selection method was used to solve the overfitting problem. The results show that in general, the machine learning especially the Random Forest algorithm achieves high classification accuracy than deep learning network. In other hand, the high sparsity and less mutual information of dataset has large impact on degradation of the performance of classification algorithms than other characteristics of data.


Download data is not yet available.


R. G. Nadakinamani et al., “Clinical Data Analysis for Prediction of Cardiovascular Disease Using Machine Learning Techniques,” Comput. Intell. Neurosci., vol. 2022, 2022. DOI:

D. Oreski, S. Oreski, and B. Klicek, “Effects of dataset characteristics on the performance of feature selection techniques,” Appl. Soft Comput., vol. 52, pp. 109–119, 2017. DOI:

A. Gacek, “An introduction to ECG signal processing and analysis,” in ECG Signal Processing, Classification and Interpretation, Springer, 2012, pp. 21–46. DOI:

K. H. Boon, M. Khalil-Hani, and M. B. Malarvili, “Paroxysmal atrial fibrillation prediction based on HRV analysis and non-dominated sorting genetic algorithm III,” Comput. Methods Programs Biomed., vol. 153, pp. 171–184, 2018. DOI:

F. Charte, A. Rivera, M. J. del Jesus, and F. Herrera, “On the impact of dataset complexity and sampling strategy in multilabel classifiers performance,” in International conference on hybrid artificial intelligence systems, 2016, pp. 500–511. DOI:

J. Ribeiro, R. Silva, L. Cardoso, and R. Alves, “Does Dataset Complexity Matters for Model Explainers?,” in 2021 IEEE International Conference on Big Data (Big Data), 2021, pp. 5257–5265. DOI:

F. Branchaud-Charron, A. Achkar, and P.-M. Jodoin, “Spectral metric for dataset complexity assessment,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3215–3224. DOI:

N. Anwar, G. Jones, and S. Ganesh, “Measurement of data complexity for classification problems with unbalanced data,” Stat. Anal. Data Min. ASA Data Sci. J., vol. 7, no. 3, pp. 194–211, 2014. DOI:

Y. Zhang, S. Wei, C. Di Maria, and C. Liu, “Using Lempel–Ziv complexity to assess ECG signal quality,” J. Med. Biol. Eng., vol. 36, no. 5, pp. 625–634, 2016. DOI:

J. Luengo, A. Fernández, S. García, and F. Herrera,“Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling,” Soft Comput., vol. 15, no. 10, pp. 1909–1936, 2011. DOI:

J. P. Li, A. U. Haq, S. U. Din, J. Khan, A. Khan, and A. Saboor, “Heart disease identification method using machine learning classification in e-healthcare,” IEEE Access, vol. 8, pp. 107562–107582, 2020. DOI:

P. Ghosh et al., “Efficient prediction of cardiovascular disease using machine learning algorithms with relief and LASSO feature selection techniques,” IEEE Access, vol. 9, pp. 19304–19326, 2021. DOI:

J. Brownlee, “Information gain and mutual information for machine learning,” Preuzeto, vol. 18, p. 2020, 2019.

S. Marsland, Machine learning: an algorithmic perspective. Chapman and Hall/CRC, 2011.

B. Mahesh, “Machine learning algorithms-a review,” Int. J. Sci. Res. (IJSR).[Internet], vol. 9, pp. 381–386, 2020.

J. Alzubi, A. Nayyar, and A. Kumar, “Machine learning from theory to algorithms: an overview,” in Journal of physics: conference series, 2018, vol. 1142, no. 1, p. 12012. DOI:

Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015. DOI:

C. M. Van der Walt, “Data measures that characterise classification problems.” University of Pretoria, 2008.

Y.-H. Chen and S.-N. Yu, “Selection of effective features for ECG beat recognition based on nonlinear correlations,” Artif. Intell. Med., vol. 54, no. 1, pp. 43–52, 2012. DOI:

M. S. Bin Sinal and E. Kamioka, “An Efficient Arrhythmia Detection Using Autocorrelation and Statistical Approach,” J. Comput. Commun., vol. 6, no. 10, pp. 63–81, 2018. DOI:

S. Goswami, C. A. Murthy, and A. K. Das, “Sparsity measure of a network graph: Gini index,” Inf. Sci. (Ny)., vol. 462, pp. 16–39, 2018. DOI:

R. Shwartz-Ziv and A. Armon, “Tabular data: Deep learning is not all you need,” Inf. Fusion, vol. 81, pp. 84–90, 2022. DOI:




How to Cite

Oleiwi, Z. chaffat, AlShemmary, E. N., & Al-augby, S. (2023). Identify Best Learning Method for Heart Diseases Prediction Under impact of Different Datasets Characteristics . Journal of Kufa for Mathematics and Computer, 10(1), 27–41.

Similar Articles

1 2 3 4 5 6 > >> 

You may also start an advanced similarity search for this article.