This program discusses the development of a machine learning (ML) program designed to identify specific hacking activities using forensic evidence from PCAP files, which are data files created by network analyzers like Wireshark.
These files capture packet data across various layers of the Open Systems Interconnection (OSI) model, providing a rich source of data that, once converted to a human-readable format, can help forensic investigators identify suspicious activities like DDoS attacks and port scanning.
However, manually analyzing these files is inefficient and challenging, especially given their potentially vast size and the static nature of human expertise in recognizing novel network threats.
Thus, this program employs supervised ML to automatically learn from and identify patterns indicative of three types of network scans: port scanning, OS scanning, and host scanning.
To effectively train the ML models, the program leverages a variety of network traffic data encapsulated in the PCAP files.
Specific features extracted for the ML process include different types of network scans—port, OS, and host scans, which are differentiated by aspects such as TCP flags, TTL, Packet Size, Window Size, and Maximum Segment Size.
These scans vary in their methodologies; for example, port scanning involves sending packets to various ports to determine their status, while OS scanning, or fingerprinting, identifies a host’s operating system based on characteristics of the packets it emits.
For each scan type, the program considers a range of protocols and TCP flags to enrich the dataset used for ML.
The ML methodology follows a structured process involving data preparation, algorithm testing, and model improvement to enhance predictive accuracy.
The ML framework utilized includes decision trees, random forests, k-nearest neighbors (KNN), and support vector classification (SVC).
These models are trained on labeled datasets that are meticulously prepared and encoded, ensuring a balanced representation of different network scans.
Cross-validation methods are employed to validate the models’ effectiveness and minimize overfitting, helping establish robust predictions of hacking activities.
Finally, the paper underscores the program’s capacity to automate the detection of network scans, thereby significantly aiding forensic investigators.
The use of ML not only expedites the analysis of complex and large datasets but also enhances the detection capabilities beyond the limitations of manual analysis.
Future work will focus on improving data processing automation, exploring advanced feature engineering techniques, and possibly integrating deep learning algorithms to broaden the scope of detectable activities and improve overall model performance.
This progression aims to refine the program’s accuracy and utility in real-world applications, making it a powerful tool in the ongoing effort to secure networks against a variety of cyber threats.
We welcome any and all contributions! Here are some ways you can get started:
Kali Linux 2024.4, the final release of 2024, brings a wide range of updates and…
This Go program applies a lifetime patch to PowerShell to disable ETW (Event Tracing for…
GPOHunter is a comprehensive tool designed to analyze and identify security misconfigurations in Active Directory…
Across small-to-medium enterprises (SMEs) and managed service providers (MSPs), the top priority for cybersecurity leaders…
The free and open-source security platform SecHub, provides a central API to test software with…
Don't worry if there are any bugs in the tool, we will try to fix…