CLASSIFYING ANDROID MALWARE CATEGORIES BASED ON DYNAMIC FEATURES: AN INTEGRATION OF FEATURE REDUCTION AND SELECTION TECHNIQUES

Authors

DOI:

https://doi.org/10.30572/2018/KJE/160206

Keywords:

Android, Malware, Dynamic Analysis, Machine Learning, Malware Category Classification

Abstract

Android malware has grown steadily into a major internet threat. Despite efforts to identify and categorize malware in seemingly safe Android apps, addressing this issue is still lacking. Therefore, understanding the unique behaviors of common Android malware categories is essential. This study utilizes machine learning techniques namely, K-Nearest Neighbor, Random Forest and Decision Tree to classify Android malware based on dynamic analysis. As feature selection and reduction techniques, Mutual Information and Principle Component Analysis are used. The research analyzes a large dataset, containing fourteen primary malware categories using the CCCS-CIC-AndMal2020 dataset. Unlike previous research, the proposed method makes a balance between the number of features and classifiers’ performance, resulting in an overall detection accuracy of 98% in the fourteen analyzed categories and excluding 78.87% of the original dataset’s features. The research, thus, introduces an efficient Android malware detection method that reduces the computational cost and improves the classification accuracy.

Downloads

Download data is not yet available.

References

A. Mawgoud, A., Rady, H.M. and Tawfik, B.S. (2021) ‘A Malware Obfuscation AI Technique to Evade Antivirus Detection in Counter Forensic Domain’, in A.-E. Hassanien, M.H.N. Taha, and N.E.M. Khalifa (eds) Enabling AI Applications in Data Science. Cham: Springer International Publishing, pp. 597–615. Available at: https://doi.org/10.1007/978-3-030-52067-0_27.

AndMal 2020 | Datasets | Research | Canadian Institute for Cybersecurity | UNB (2020). Available at: https://www.unb.ca/cic/datasets/andmal2020.html (Accessed: 7 February 2024).

Battiti, R. (1994) ‘Using mutual information for selecting features in supervised neural net learning’, IEEE Transactions on Neural Networks, 5(4), pp. 537–550. Available at: https://doi.org/10.1109/72.298224.

Belgiu, M. and Drăguţ, L. (2016) ‘Random forest in remote sensing: A review of applications and future directions’, ISPRS Journal of Photogrammetry and Remote Sensing, 114, pp. 24–31. Available at: https://doi.org/10.1016/j.isprsjprs.2016.01.011.

Breiman, L. (2001) ‘Random Forests’, Machine Learning, 45(1), pp. 5–32. Available at: https://doi.org/10.1023/A:1010933404324.

Chawla, N. V. et al. (2002) ‘SMOTE: Synthetic Minority Over-sampling Technique’, Journal of Artificial Intelligence Research, 16, pp. 321–357. Available at: https://doi.org/10.1613/jair.953.

Cui, J. et al. (2023) ‘Malware behavior detection method based on reinforcement learning’, in V. Varadarajan, J.C.-W. Lin, and P. Lorenz (eds) International Conference on Computer Application and Information Security (ICCAIS 2022). SPIE, p. 36. Available at: https://doi.org/10.1117/12.2671736.

Dudek, G. (2022) ‘A Comprehensive Study of Random Forest for Short-Term Load Forecasting’, Energies, 15(20), p. 7547. Available at: https://doi.org/10.3390/en15207547.

Erdogan Erten, G., Bozkurt Keser, S. and Yavuz, M. (2021) ‘Grid Search Optimised Artificial Neural Network for Open Stope Stability Prediction’, International Journal of Mining, Reclamation and Environment, 35(8), pp. 600–617. Available at: https://doi.org/10.1080/17480930.2021.1899404.

Frery, A.C. (2023) ‘Interquartile Range’, in B.S. Daya Sagar et al. (eds) Encyclopedia of Mathematical Geosciences. Cham: Springer International Publishing, pp. 664–666. Available at: https://doi.org/10.1007/978-3-030-85040-1_165.

G, R., P, V. and S, A. (2023) ‘Evading Machine-Learning-Based Android Malware Detector for IoT Devices’, IEEE Systems Journal, 17(2), pp. 2745–2755. Available at: https://doi.org/10.1109/JSYST.2022.3215014.

Gholamy, A., Kreinovich, V. and Kosheleva, O. (2018) ‘Why 70/30 or 80/20 Relation Between Training and Testing Sets: A Pedagogical Explanation’, Departmental Technical Reports (CS) [Preprint]. Available at: https://scholarworks.utep.edu/cs_techrep/1209 (Accessed: 17 March 2024).

Gopal, S., Patro, K. and Kumar Sahu, K. (2015) ‘Normalization: A Preprocessing Stage’, IARJSET, pp. 20–22. Available at: https://doi.org/10.17148/iarjset.2015.2305.

Gronat, P., Aldana-Iuit, J.A. and Balek, M. (2019) ‘MaxNet: Neural Network Architecture for Continuous Detection of Malicious Activity’, in 2019 IEEE Security and Privacy Workshops (SPW). IEEE, pp. 28–35. Available at: https://doi.org/10.1109/SPW.2019.00018.

Hussain, S. and Mohideen S, P. (2023) ‘Advanced Machine Learning Approach for Suspicious Coded Message Detection using Enigma Cipher’, in 2023 Second International Conference on Electronics and Renewable Systems (ICEARS). IEEE, pp. 800–803. Available at: https://doi.org/10.1109/ICEARS56392.2023.10085339.

Islam, R. et al. (2023) ‘Android malware classification using optimum feature selection and ensemble machine learning’, Internet of Things and Cyber-Physical Systems, 3, pp. 100–111. Available at: https://doi.org/10.1016/j.iotcps.2023.03.001.

Le, N.C. et al. (2020) ‘A Machine Learning Approach for Real Time Android Malware Detection’, in 2020 RIVF International Conference on Computing and Communication Technologies (RIVF). IEEE, pp. 1–6. Available at: https://doi.org/10.1109/RIVF48685.2020.9140771.

Li, L. et al. (2020) ‘Comprehensive evaluation of robotic global performance based on modified principal component analysis’, International Journal of Advanced Robotic Systems, 17(4), p. 172988141989688. Available at: https://doi.org/10.1177/1729881419896881.

Liu, Gaoyuan et al. (2022) ‘An Enhanced Intrusion Detection Model Based on Improved kNN in WSNs’, Sensors, 22(4), p. 1407. Available at: https://doi.org/10.3390/s22041407.

Liu, H. et al. (2009) ‘Feature selection with dynamic mutual information’, Pattern Recognition, 42(7), pp. 1330–1339. Available at: https://doi.org/10.1016/j.patcog.2008.10.028.

Lou, S. et al. (2019) ‘TFDroid: Android Malware Detection by Topics and Sensitive Data Flows Using Machine Learning Techniques’, in 2019 IEEE 2nd International Conference on Information and Computer Technologies (ICICT). IEEE, pp. 30–36. Available at: https://doi.org/10.1109/INFOCT.2019.8711179.

M, H. and M.N, S. (2015) ‘A Review on Evaluation Metrics for Data Classification Evaluations’, International Journal of Data Mining & Knowledge Management Process, 5(2), pp. 01–11. Available at: https://doi.org/10.5121/ijdkp.2015.5201.

Ma, Z. et al. (2019) ‘A Combination Method for Android Malware Detection Based on Control Flow Graphs and Machine Learning Algorithms’, IEEE Access, 7, pp. 21235–21245. Available at: https://doi.org/10.1109/ACCESS.2019.2896003.

Mehtab, A. et al. (2020) ‘AdDroid: Rule-Based Machine Learning Framework for Android Malware Analysis’, Mobile Networks and Applications, 25(1), pp. 180–192. Available at: https://doi.org/10.1007/s11036-019-01248-0.

Mohammed, H.A., Kareem, S.W. and Mohammed, A.S. (2022) ‘A COMPARATIVE EVALUATION OF DEEP LEARNING METHODS IN DIGITAL IMAGE CLASSIFICATION’, Kufa Journal of Engineering, 13(4), pp. 53–69. Available at: https://doi.org/10.30572/2018/KJE/130405.

Musikawan, P. et al. (2023) ‘An Enhanced Deep Learning Neural Network for the Detection and Identification of Android Malware’, IEEE Internet of Things Journal, 10(10), pp. 8560–8577. Available at: https://doi.org/10.1109/JIOT.2022.3194881.

Rahali, A. et al. (2020) ‘DIDroid: Android malware classification and characterization using deep image learning’, ACM International Conference Proceeding Series, pp. 70–82. Available at: https://doi.org/10.1145/3442520.3442522.

Rajendiran, G. and Rethnaraj, J. (2023) ‘Lettuce Crop Yield Prediction Analysis using Random Forest Regression Machine Learning Model in Aeroponics System’, in 2023 Second International Conference on Augmented Intelligence and Sustainable Systems (ICAISS). IEEE, pp. 565–572. Available at: https://doi.org/10.1109/ICAISS58487.2023.10250535.

Shatnawi, A.S., Yassen, Q. and Yateem, A. (2022) ‘An Android Malware Detection Approach Based on Static Feature Analysis Using Machine Learning Algorithms’, in Procedia Computer Science. Elsevier B.V., pp. 653–658. Available at: https://doi.org/10.1016/j.procs.2022.03.086.

Tan, P.-N., Steinbach, M. and Kumar, V. (2014) Introduction to data mining. First Edition. Pearson (Intl).

Tiwari, S.R. and Shukla, R.U. (2018) ‘An Android Malware Detection Technique Based on Optimized Permissions and API’, in 2018 International Conference on Inventive Research in Computing Applications (ICIRCA). IEEE, pp. 258–263. Available at: https://doi.org/10.1109/ICIRCA.2018.8597225.

V, A. et al. (2023) ‘Malware Detection using Dynamic Analysis’, in 2023 International Conference on Advances in Intelligent Computing and Applications (AICAPS). IEEE, pp. 1–6. Available at: https://doi.org/10.1109/AICAPS57044.2023.10074588.

Vijay, A., Portillo-Dominguez, A.O. and Ayala-Rivera, V. (2022) ‘Android-based Smartphone Malware Exploit Prevention Using a Machine Learning-based Runtime Detection System’, in 2022 10th International Conference in Software Engineering Research and Innovation (CONISOFT). IEEE, pp. 131–139. Available at: https://doi.org/10.1109/CONISOFT55708.2022.00026.

Xu, L., Zhang, C. and Tang, K. (2023) ‘A malware analysis method based on behavioral knowledge graph’, in Y. Yue (ed.) International Conference on Electronic Information Engineering and Computer Science (EIECS 2022). SPIE, p. 71. Available at: https://doi.org/10.1117/12.2668119.

Xu, Q. et al. (2023) ‘Android Malware Detection Based on Behavioral-Level Features with Graph Convolutional Networks’, Electronics, 12(23), p. 4817. Available at: https://doi.org/10.3390/electronics12234817.

Zhou, H., Wang, X. and Zhu, R. (2022) ‘Feature selection based on mutual information with correlation coefficient’, Applied Intelligence, 52(5), pp. 5457–5474. Available at: https://doi.org/10.1007/s10489-021-02524-x.

Downloads

Published

2025-04-30

How to Cite

alsraratee, abdullah, and Ahmed Al-Azawei. “CLASSIFYING ANDROID MALWARE CATEGORIES BASED ON DYNAMIC FEATURES: AN INTEGRATION OF FEATURE REDUCTION AND SELECTION TECHNIQUES”. Kufa Journal of Engineering, vol. 16, no. 2, Apr. 2025, pp. 96-118, https://doi.org/10.30572/2018/KJE/160206.

Share