Kali Linux

Subparse : Modular Malware Analysis Artifact Collection And Correlation Framework

Subparse, is a modular framework developed by Josh Strochein, Aaron Baker, and Odin Bernstein. The framework is designed to parse and index malware files and present the information found during the parsing in a searchable web-viewer. The framework is modular, making use of a core parsing engine, parsing modules, and a variety of enrichers that add additional information to the malware indices.

The main input values for the framework are directories of malware files, which the core parsing engine or a user-specified parsing engine parses before adding additional information from any user-specified enrichment engine all before indexing the information parsed into an elasticsearch index.

The information gathered can then be searched and viewed via a web-viewer, which also allows for filtering on any value gathered from any file. There are currently 3 parsing engine, the default parsing modules (ELFParser, OLEParser and PEParser), and 4 enrichment modules (ABUSEEnricher, CAPEEnricher, STRINGEnricher and YARAEnricher).

Getting Started

Software Requirements

To get started using Subparse there are a few requrired/recommened programs that need to be installed and setup before trying to work with our software.

SoftwareStatusLink
DockerRequiredInstallation Guide
Python3.8.1RequiredInstallation Guide
PyenvRecommendedInstallation Guide

Additional Requirements

After getting the required/recommended software installed to your system there are a few other steps that need to be taken to get Subparse installed. Python Requirements Python requires some other packages to be installed that Subparse is dependent on for its processes. To get the Python set up completed navigate to the location of your Subparse installation and go to the *parser* folder. The following commands that you will need to use to install the Python requirements is: Docker Requirements Since Subparse uses Docker for its backend and web interface, the set up of the Docker containers needs to be completed before being able to use the program. To do this navigate to the root directory of the Subparse installation location, and use the following command to set up the docker instances:

Installation steps

Usage

Command Line Options

Command line options that are available for subparse/parser/subparse.py:

ArgumentAlternativeRequiredDescription
-h–helpNoShows help menu
-d SAMPLES_DIR–directory SAMPLES_DIRYesDirectory of samples to parse
-e ENRICHER_MODULES–enrichers ENRICHER_MODULESNoEnricher modules to use for additional parsing
-r–resetNoReset/delete all data in the configured Elasticsearch cluster
-v–verboseNoDisplay verbose commandline output
-s–service-modeNoEnters service mode allowing for mode samples to be added to the SAMPLES_DIR while processing

Viewing Results

To view the results from Subparse’s parsers, navigate to localhost:8080. If you are having trouble viewing the site, make sure that you have the container started up in Docker and that there is not another process running on port 8080 that could cause the site to not be available.

General Information Collected

Before any parser is executed general information is collected about the sample regardless of the underlying file type. This information includes:

  • MD5 hash of the sample
  • SHA256 hash of the sample
  • Sample name
  • Sample size
  • Extension of sample
  • Derived extension of sample

Parser Modules

Parsers are ONLY executed on samples that match the file type. For example, PE files will by default have the PEParser executed against them due to the file type corresponding with those the PEParser is able to examine.

Default Modules

ELFParser This is the default parsing module that will be executed against ELF files. Information that is collected: OLEParser This is the default parsing module that will be executed against OLE and RTF formatted files, this uses the OLETools package to obtain data. The information that is collected: PEParser This is the default parsing module that will be executed against PE files that match or include the file types: PE32 and MS-Dos. Information that is collected:

Enricher Modules

These modules are optional modules that will ONLY get executed if specified via the -e | –enrichers flag on the command line.

Default Modules

ABUSEEnricher This enrichers uses the [Abuse.ch]() API and [Malware Bazaar]() to collect more information about the sample(s) subparse is analyzing, the information is then aggregated and stored in the Elastic database. CAPEEnricher This enrichers is used to communicate with a CAPEv2 Sandbox instance, to collect more information about the sample(s) through dynamic analysis, the information is then aggregated and stored in the Elastic database utilizing the Kafka Messaging Service for background processing. STRINGEnricher This enricher is a smart string enricher, that will parse the sample for potentially interesting strings. The categories of strings that this enricher looks for include: Audio, Images, Executable Files, Code Calls, Compressed Files, Work (Office Docs.), IP Addresses, IP Address + Port, Website URLs, Command Line Arguments. YARAEnricher This ericher uses a pre-compiled yara file located at: parser/src/enrichers/yara_rules. This pre-compiled file includes rules from and

Developing Custom Parsers & Enrichers

Subparse’s web view was built using Bootstrap for its CSS, this allows for any built in Bootstrap CSS to be used when developing your own custom Parser/Enricher Vue.js files. We have also provided an example for each to help get started and have also implemented a few custom widgets to ease the process of development and to promote standardization in the way information is being displayed. All Vue.js files are used for dynamically displaying information from the custom Parser/Enricher and are used as templates for the data.

Note: Naming conventions with both class and file names must be strictly adheared to, this is the first thing that should be checked if you run into issues now getting your custom Parser/Enricher to be executed. The naming convention of your Parser/Enricher must use the same name across all of the files and class names.

Logging

The logger object is a singleton implementation of the default Python logger. For indepth usage please reference the Offical Doc. For Subparse the only logging methods that we recommend using are the logging levels for output. These are:

  • debug
  • warning
  • error
  • critical
  • exception
  • log
  • info

ACKNOWLEDGEMENTS

  • This research and all the co-authors have been supported by NSA Grant H98230-20-1-0326.
R K

Recent Posts

Kali Linux 2024.4 Released, What’s New?

Kali Linux 2024.4, the final release of 2024, brings a wide range of updates and…

19 hours ago

Lifetime-Amsi-EtwPatch : Disabling PowerShell’s AMSI And ETW Protections

This Go program applies a lifetime patch to PowerShell to disable ETW (Event Tracing for…

19 hours ago

GPOHunter – Active Directory Group Policy Security Analyzer

GPOHunter is a comprehensive tool designed to analyze and identify security misconfigurations in Active Directory…

3 days ago

2024 MITRE ATT&CK Evaluation Results – Cynet Became a Leader With 100% Detection & Protection

Across small-to-medium enterprises (SMEs) and managed service providers (MSPs), the top priority for cybersecurity leaders…

5 days ago

SecHub : Streamlining Security Across Software Development Lifecycles

The free and open-source security platform SecHub, provides a central API to test software with…

1 week ago

Hawker : The Comprehensive OSINT Toolkit For Cybersecurity Professionals

Don't worry if there are any bugs in the tool, we will try to fix…

1 week ago