INVESTIGATING THE EFFECT OF TRAFFIC SAMPLING ON MACHINE LEARNING BASED NETWORK INTRUSION DETECTION APPROACHES
DOI:
https://doi.org/10.63458/ijerst.v2i2.77Keywords:
Cyber Threat Intelligence, Machine Learning Models, Deep Learning Techniques, Logistic Regression , KNearest Neighbors (KNN), Gaussian Naïve Bayes (GNB), Support Vector Machines (SVM), Convolutional Neural Network (CNN),dAbstract
The goal of this Paper is to improve cybersecurity threat detection by thoroughly examining deep learning and machine learning models. The study attempts to solve the difficulty of precisely categorizing and forecasting hostile actions in network traffic by focusing on a dataset that encompasses a variety of cyber threats. Preprocessing the data, using Principal Component Analysis (PCA) to apply dimensionality reduction, and putting a variety of machine learning algorithms into practice— including Logistic Regression, K-Nearest Neighbours, Gaussian Naive Bayes, Support Vector Machines, Decision Trees, and Random Forest—are all part of the methodology. Important conclusions highlight how ensemble models— Random Forest in particular—work well to achieve notable precision and accuracy. Principal Component Analysis's effect on model performance is also examined, providing information about the significance of features and the interpretability of the model. In addition to highlighting the promise of ensemble methods for reliable threat detection, the research provides insightful information about the efficacy of different machine learning algorithms in cybersecurity. The study’s insights have practical consequences for cybersecurity practitioners and lay the groundwork for future cybersecurity analytics research projects.
References
Alikhanov, J., Jang, R., Abuhamad, M.,Mohaisen, D., Nyang, D., & Noh, Y., “ Investigating the effect of traffic sampling on machine learning-based network intrusion detection approaches”. IEEE Access, 10, 58015823. 2021 DOI: https://doi.org/10.1109/ACCESS.2021.3137318
Carela-Español, V., Barlet-Ros, P., Cabellos-Aparicio, A., & Solé-Pareta,, “ Analysis of the impact of sampling on NetFlow traffic classification”. Computer Networks, 55(5), 1083-1099.2011 DOI: https://doi.org/10.1016/j.comnet.2010.11.002
Chaulwar, A., Botsch, M., & Utschick, W. , “A machine learning based biased- sampling approach for planning safe trajectories in complex, dynamic traffic scenarios”. In 2017 IEEE Intelligent Vehicles Symposium (IV) (pp. 297-303). IEEE.2017 DOI: https://doi.org/10.1109/IVS.2017.7995735
Yuan, R., Li, Z., Guan, X., & Xu, L., “ An SVM-based machine learning method for accurate internet traffic classification. Information Systems Frontiers”, 12, 149-156. 2010 DOI: https://doi.org/10.1007/s10796-008-9131-2
Kim, S., Yoon, S., & Lim, H., “ Deep reinforcement learning-based traffic sampling for multiple traffic analyzers on software defined networks”. IEEE Access, 9, 4781547827. 2021 DOI: https://doi.org/10.1109/ACCESS.2021.3068459
Nguyen, T. T., & Armitage, G., “A survey of techniques for internet traffic classification using machine learning”. IEEE communications surveys & tutorials, 10(4), 56-76. 2008 DOI: https://doi.org/10.1109/SURV.2008.080406
Carela-Espanol, V., Barlet-Ros, P., SoléPareta, J., “Traffic classification with sampled netflow. Traffic”, 33, 34. 2009.
Jin, Y., Duffield, N., Erman, J., Haffner,P., Sen, S., & Zhang, Z. L., “A modular machine learning system for flow-level traffic classification in large networks”. ACM Transactions on Knowledge Discovery from Data (TKDD), 6(1), 1-34. 2012 DOI: https://doi.org/10.1145/2133360.2133364
Arndt, D. J., & Zincir-Heywood, A. N. , “ comparison of three machine learning techniques for encrypted network traffic analysis”. In 2011 IEEE symposium on computational intelligence for security and defense applications (CISDA) (pp. 107-114). IEEE. 2011 DOI: https://doi.org/10.1109/CISDA.2011.5945941
Jadav, N., Dutta, N., Sarma, H. K. D., Pricop, E., & Tanwar, S., “A machine learning approach to classify network traffic”. In 2021 13th International Conference on Electronics, Computers and Artificial Intelligence (ECAI) (pp. 1-6). IEEE. 2021 DOI: https://doi.org/10.1109/ECAI52376.2021.9515039
Singh, R., Kumar, H., & Singla, R. K., “Sampling based approaches to handle imbalances in network traffic dataset for machine learning techniques”. arXiv preprint arXiv:1311.2677. 2013. DOI: https://doi.org/10.5121/csit.2013.3704
Krasniqi, F., Elias, J., Leguay, J., & Redondi, A. E., “End-to-end delay prediction based on traffic matrix sampling”. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) (pp. 774-779). IEEE. 2020. DOI: https://doi.org/10.1109/INFOCOMWKSHPS50562.2020.9162765
Li, W., & Moore, A. W. ,”A machine learning approach for efficient traffic classification”. In 2007 15th International symposium on modeling, analysis, and simulation of computer and telecommunication systems (pp. 310-317). IEEE. 2007. DOI: https://doi.org/10.1109/MASCOTS.2007.2
Knapińska, A., Lechowicz, P., & Walkowiak, K., “Machine-learning based prediction of multiple types of network traffic”. In International Conference on Computational Science (pp. 122-136). Cham: Springer International Publishing. 2021. DOI: https://doi.org/10.1007/978-3-030-77961-0_12
Singh, R., Kumar, H., & Singla, R. K., “Issues related to sampling techniques for network traffic dataset”. International Journal of Mobile Network Communications & Telematics, 3(4), 75-85. 2013. DOI: https://doi.org/10.5121/ijmnct.2013.3407