Machine Learning Models for Intrusion Detection Systems (IDS): Tips for developing Academically sound IDS Models and algorithms for Your IEEE Publication 2019Posted on August 8th, 2019 by Frank in Engineering & Technology
- AI is promising in the of detecting intrusions are cross-site scripting attacks, SQL injections, Denial of Service (DoS), Ransomware, etc.
- Deep learning models able to accurately predict and recognize “normal activity” or malicious activity faster and generate a much lower percentage of false positives
- Real-time IDS systems must not have a single fixed algorithm instead the algorithms must be updated regularly
Network Intrusions are unauthorized activities on the organization’s local computer network that disrupts the work of targeted user or leads to damage, theft or x of the target. Although systems such as firewall and antivirus have been developed, the weaknesses caused by the architecture are not updated, and new attacks can be created. This led to the development of Intrusion Detection systems (IDS) and Intrusion Prevention Systems (IPS) which have been deployed for many years as a logical combination with one or more firewalls. It is necessary to have a clear understanding of the intrusion since the characteristics of the individual attacks have to be analyzed to detect or block these intrusions . National Institute of Standards and Intrusion Detection System provides a guidance document on IDS.
The different types of IDS vary from a single computer to a large network of computers. The IPS system will not only detect the intrusions but also take necessary action to prevent them from taking place. On the other hand, the IDS will not take any action. It merely creates a log that a particular intrusion has taken place. The IPS will restrict the access of the intrusions by limiting the resources, blocking the source IP address, etc.
Types of IDS Techniques
There are different types of IDS systems which are network-based (NIDS), host-based intrusion (HIDS) and vulnerability assessment-based (Signature and anomaly-based intrusion detection techniques) IDS systems. A Network Intrusion Detection System (NIDS) analyses and monitors the nodes and network devices for any incoming and outgoing network traffic. The traffic is split into individual packets, and each of these packets analysed and compared with the characteristics of the existing knowledge of intrusions . It also monitors multiple hosts and targets the intrusions without affecting the efficiency of the hosts. A host-based intrusion detection system (HIDS) is a system that monitors the necessary files. It can identify malicious traffic and unusual network packets that were not recognized by the NIDS. The HIDS can run directly both in internet device and also in the local intranet network. HIDS has the ability to separate the malicious traffic from the host.
Intrusion Detection and Prevention Systems (IDPS) are a combination of both IDS and IPS. They are just like alarms and can identify the imminent threats that are disguised as ordinary traffic and instantly prevent them from entering the system. The specific technique that was used in this work is not mentioned due to appropriate reasons. The challenges facing these IDPS systems is that hackers are increasingly finding new and adaptive methods to hide their packets, thereby increasing the difficulty of identifying them. Even if one of them gets successful in entering the system, they can create havoc even before getting discovered. The increasing variety of these attacks does not provide comfort to the IDPS system. The most commonly occurring intrusions are cross-site scripting attacks, SQL injections, Denial of Service (DoS), Ransomware, etc. Each of these attacks works in a completely different way, and hence, the IDPS system must be intelligent and should be able to adapt to the ever dynamic network.
Pre-Processing the IDS data
Simulation of IDS for research purposes is usually performed through a dataset (e.g. KDD’99, DARPA dataset, NSD-KDD, KDD-CUP 2002). This dataset must contain the latest real-time intrusion data. The dataset size should be huge, and also balanced (between training and test data) since the training of the classifiers will be more effective if the data is extensive and balanced. This can be useful since efficient training means better prevention of intrusions in real-time traffic. Feature selection also plays a significant role as it makes it easier to categorize, reduce operation time and improves classification performance and accuracy.
Appropriate pre-processing techniques should be selected. Most datasets require various pre-processing like filling in the missing values, outliers, etc. After the data is adequately cleaned, feature extraction is required. This should be done since the dataset might contain lots of unnecessary features that will not be of any use in the classification. However, its presence might increase the computational time; hence, only the necessary elements must be selected. A novel intelligent technique must be identified from the gap for extracting the features which can combine two more previously available techniques or a completely new algorithm can be built.
IDS Classification Algorithms: Methods and Approaches
Although artificial intelligence / deep learning models able to accurately predict and recognize “normal activity” or malicious activity faster and generate a much lower percentage of false positives, along with-it researchers also recommend hybrid approaches. A novel combination of intelligent techniques must be obtained by incorporating machine learning (ML) algorithm [Artificial Neural Networks] such as Bayesian networks, neural networks, fuzzy logic, Support Vector Machines (SVM), Multivariate Adaptive Regression Splines (MARS), and Linear Genetic Programming Algorithms, statistical anomaly detection algorithms (such as statistical moment, mathematical modeling, operational model, time-series, multivariate model, Markova models), Data mining Algorithms (e.g., frequent pattern mining, classification, association rule discovery, mining data streams, and classification) and knowledge–based detection (e.g. state transition analysis, expert systems, signature analysis, Petri Net). The recent example of such algorithm is The Amazon SageMaker IP Insights, is an unsupervised algorithm that uses statistical modelling and neural network to capture associations between IPv4 addresses and online resources (e.g. online bank account)
“Although artificial intelligence / deep learning models able to accurately predict and recognize malicious activity faster with a lower percentage of false positives, along with-it researchers also recommend hybrid approaches”Frank
These algorithms can be implemented parallelly in WEKA, or MATLAB, and then the performance of the model can be compared by evaluating the metrics (e.g. True Positive, False Positive, Precision, F-measure, Receiver Operating Curve ROC Area, the Detection Rate (DR), false alarm rate (FAR) . Otherwise, the algorithms may be stacked in such a way that each one is executed individually one by one and later, all of them work together to improve the accuracy of the detection. Once the malicious packets are identified, they must be prevented from entering the network, and the system should make sure that similar packets do not enter the network.
Building an IDS is extremely challenging since lots of algorithms and machine learning techniques are involved in it, thereby increasing its complexity. The age of the data also increases complexity. Older datasets contain old data which can be easily prevented by the newer IDP systems. However, more modern datasets contain new data and attacks, and thus, the complexity is increased. Unlike other security systems, real-time IDS systems must not have a single fixed algorithm. Instead, the algorithms must be updated regularly, and newer and more efficient algorithms must be updated in the system. This is because the intrusions are usually modernized and periodically updated by the hackers. Keeping the system in a constant update will improve the accuracy and reduce the error rates.
About the Author: Frank, is a software engineer cum researcher in the Data Science division of PhD Assistance Research Lab. They are interested in researching ways we can use machine learning and technology.
- N. Subramanian and A. Jeyaraj, “Recent security challenges in cloud computing,” Comput. Electr. Eng., vol. 71, pp. 28–42, Oct. 2018.
- Z. Zhang and A. Meddahi, “Intrusion Prevention and Detection in NFV,” in Security in Network Functions Virtualization, Elsevier, 2017, pp. 157–172.
- . K. Gautam and H. Om, “Computational neural network regression model for Host-based Intrusion Detection System,” Perspect. Sci., vol. 8, pp. 93–95, Sep. 2016.
- L. Coppolino, S. D’Antonio, G. Mazzeo, and L. Romano, “Cloud security: Emerging threats and current solutions,” Comput. Electr. Eng., vol. 59, pp. 126–140, Apr. 2017.
- R. Walters, “Cyber Attacks on U.S. Companies in 2016,” Herit. Found. Issue Br., vol. No. 4636, no. December, pp. 1–5, 2016.
- H. M. Alsafi, W. M. Abdullah, and A.-
Related Topics Engineering and Technology
Latest posts by Engineering and Technology (see all)
- Recent PhD Research Topic Ideas for Electronic Engineering 2020 - October 19, 2019
- Recent PhD Research Topic Ideas for Electrical Engineering 2020 - October 19, 2019
- Recent PhD Research Topic Ideas for Civil Engineering 2020 - October 19, 2019