Optimasi Deteksi Malware Android pada Dataset Drebin Menggunakan Ensemble Learning

Haidar Nafiis Usmany, Wildanil Ghozi

Haidar Nafiis Usmany, Wildanil Ghozi

Cyber

Building of Informatics, Technology and Science (BITS)

0.0 (0 ratings)

Introduction

Optimasi deteksi malware android pada dataset drebin menggunakan ensemble learning. Optimasi deteksi malware Android di Drebin menggunakan seleksi fitur Random Forest & algoritma boosting (XGBoost, LightGBM, CatBoost). Capai akurasi >0.98. Pendekatan efektif.

50 views

Abstract

The increasing number and complexity of Android malware require detection systems that are accurate, efficient, and capable of handling high-dimensional data. Machine learning–based approaches have become one of the widely adopted solutions in cybersecurity research. However, the performance of classification models is often affected by feature redundancy and suboptimal hyperparameter configurations. This study aims to evaluate the effectiveness of combining Random Forest–based feature selection with modern boosting classification algorithms for Android malware detection. The dataset used in this study is the Drebin 215 dataset, which was selected because it is one of the most widely used benchmark datasets for Android malware detection based on static analysis, enabling more objective comparison with previous studies. Feature selection was performed using the Random Forest feature importance method to reduce data dimensionality prior to the classification stage. The classification models employed include XGBoost, Light Gradient Boosting Machine (LightGBM), and CatBoost. The experiments were conducted under two scenarios: without hyperparameter optimization (non-tuning) and with hyperparameter optimization using the Grid Search method. Model performance was evaluated using accuracy, precision, recall, F1-score, and ROC-AUC metrics, as well as computational time analysis. The experimental results show that all models achieved very strong classification performance on the Drebin benchmark dataset, with accuracy values exceeding 0.98. Among the evaluated models, LightGBM achieved the best performance, with an accuracy of 0.9900 and an F1-score of 0.9865. This performance advantage is likely influenced by the efficiency of its histogram-based learning mechanism and leaf-wise tree growth strategy, which enables faster and more effective learning on high-dimensional data. Nevertheless, the high performance observed on this benchmark dataset still requires further evaluation on more diverse datasets or dynamic environments to ensure the generalization capability of the model in real-world scenarios. The findings of this study indicate that the combination of Random Forest–based feature selection and boosting algorithms can serve as an effective approach for improving the efficiency and performance of Android malware detection systems.

Review

This paper presents a timely and relevant investigation into enhancing Android malware detection, a critical area given the escalating threat landscape. The authors effectively address the core challenges of high-dimensional data, feature redundancy, and suboptimal hyperparameter configurations by proposing a robust ensemble learning framework. By leveraging the well-established Drebin 215 dataset, the study ensures a strong foundation for objective comparison with prior research, which is a significant strength. The methodology, focusing on Random Forest-based feature selection combined with state-of-the-art boosting algorithms, aligns well with current advancements in machine learning for cybersecurity. The methodological approach is thoroughly designed, evaluating XGBoost, LightGBM, and CatBoost under both non-tuned and Grid Search optimized conditions. This dual-scenario experimentation provides valuable insights into the impact of hyperparameter tuning on model performance. The comprehensive suite of evaluation metrics—accuracy, precision, recall, F1-score, ROC-AUC, and computational time—underscores a rigorous assessment of the proposed solutions. The findings are impressive, demonstrating exceptionally strong classification performance across all models, with accuracy values consistently exceeding 0.98. LightGBM emerges as the standout performer, achieving an accuracy of 0.9900 and an F1-score of 0.9865, a result attributed convincingly to its efficient histogram-based learning and leaf-wise growth strategy. While the reported performance on the Drebin dataset is outstanding and clearly indicates the effectiveness of the chosen combination of feature selection and boosting algorithms, the paper appropriately acknowledges a crucial limitation. The high performance observed on this specific benchmark dataset necessitates further validation on more diverse datasets or within dynamic, real-world environments. This is vital to ascertain the models' generalization capability and robustness against evolving malware threats and varied device conditions. Nevertheless, the study successfully establishes that the integration of Random Forest feature selection with boosting algorithms offers a highly effective pathway to improving the efficiency and performance of static Android malware detection systems.

Full Text

You need to be logged in to view the full text and Download file of this article - Optimasi Deteksi Malware Android pada Dataset Drebin Menggunakan Ensemble Learning from Building of Informatics, Technology and Science (BITS) .

Comments

You need to be logged in to post a comment.

Top Blogs by Rating

Favorite Blog

Optimasi Deteksi Malware Android pada Dataset Drebin Menggunakan Ensemble Learning

Home Research Details

Haidar Nafiis Usmany, Wildanil Ghozi