MEGR-APT is an advanced and scalable system designed for hunting Advanced Persistent Threats (APTs) by identifying suspicious subgraphs that align with specific attack scenarios, as described in Cyber Threat Intelligence (CTI) reports.
Its primary functionality revolves around two key processes: memory-efficient extraction of suspicious subgraphs and fast subgraph matching using Graph Neural Networks (GNNs) and attack representation learning.
Features And Functionality
Input And Architecture
The system takes kernel audit logs stored in a PostgreSQL database and attack query graphs in JSON format as inputs.
It operates through a modular architecture comprising Python scripts for core functions, Bash scripts for orchestration, and directories for logs, models, datasets, and technical documentation. Key components include:
- Provenance Graph Construction: Converts kernel audit logs into provenance graphs using tools like
construct_pg_cadets.py
and stores them in RDF graph engines such as Stardog. - Hunting Pipeline: Extracts suspicious subgraphs using
extract_rdf_subgraphs_cadets.py
and matches them with attack query graphs via pre-trained GNN models usingmain.py
.
For training its GNN-based graph matching models:
- Subgraphs are extracted using
extract_rdf_subgraphs_[dataset].py
. - Graph Edit Distance (GED) is computed for training data.
- Models are trained using parameters specified in the
main.py
script.
The system includes a Jupyter notebook (Investigation_Reports.ipynb
) to analyze detected subgraphs and generate reports for analysts. This notebook demonstrates scenarios with real-world datasets like DARPA TC3 CADETS host data.
To deploy MEGR-APT:
- Install dependencies listed in
requirements.txt
andtorch_requirements.txt
. - Set up the Stardog graph database and load RDF Provenance Graphs using the provided Bash script (
load_to_stardog.sh
). - Use the
setup_environment.sh
script to configure the environment.
MEGR-APT is particularly suited for organizations aiming to enhance their cybersecurity defenses by detecting APTs early. Its ability to process large-scale data efficiently makes it a powerful tool for threat intelligence and forensic investigations.