XGBoost Classification based Network Intrusion Detection System for Big Data using PySparkling Water
Tadepalli Anish Deepak
- 发表年份
- 2020
- 引用次数
- 4
- 访问权限
- 开放获取
摘要
A Machine learning is a technique for information investigation that robotizes model building. It is a part of man-made consciousness dependent on the possibility that frameworks can gain from information, recognize patterns and make data driven decisions on choices with negligible human intercession. Training machine learning algorithms with large volume of data (also known as big data) gives better result. Cloud Computing (CC) erases the barriers of handling the bigdata in terms of computation and storage. In this paper we are proposing a cloud-based Intrusion Detection System (IDS) using tree-based ensemble classification algorithm known as XGBoost classifier trained on CICIDS-2017 dataset which is a realistic cyber dataset which has benign and most up-to-date common seven different types network attacks. Sparkling Water enables clients to join the quick, versatile machine learning functionalities of H2O with the capacities of Spark. The proposed IDS using XGBOOST classifier from H2O.ai generated good results when compared with other algorithms like Random Forest (RF), artificial neural network (ANN), gradient boost (GBM), and stack ensemble method. Out of all algorithms XGBoost gave 99.8% accuracy on validation set and nearly 99.1% accuracy on test set form k-fold cross validation.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002