Kali Linux

unblob : Extract files from any kind of container formats

unblob is an accurate, fast, and easy-to-use extraction suite. It parses unknown binary blobs for more than 30 different archive, compression, and file-system formatsextracts their content recursively, and carves out unknown chunks that have not been accounted for.

Unblob is free to use, licensed with the MIT license. It has a Command Line Interface and can be used as a Python library.
This turns unblob into the perfect companion for extracting, analyzing, and reverse engineering firmware images.

Why unblob?

One of the major challenges of embedded security analysis is the sound and safe extraction of arbitrary firmware.

Specialized tools that can extract information from those firmware images already exist, but we were carving for something smarter that could identify both start-offset and end-offset of a specific chunk (e.g. filesystem, compression stream, archive, …).

We stick to the format standard as much as possible when deriving these offsets, and we clearly define what we want out of identified chunks (e.g., not extracting meta-data to disk, padding removal). This strategy helps us feed known valid data to extractors and precisely identify chunks, turning unknown unknowns into known unknowns.

Given the modular design of unblob and the ever-expanding repository of supported formats, unblob could very well be used in areas outside embedded security such as data recovery, memory forensics, or malware analysis.

Our Objectives

unblob has been developed with the following objectives in mind:

  • Accuracy – chunk start offsets are identified using battle tested rules, while end offsets are computed according to the format’s standard without deviating from it. We minimize false positives as much as possible by validating header structures and discarding overlapping chunks.
  • Security – unblob does not require elevated privileges to run. It’s heavily tested and has been fuzz tested against a large corpus of files and firmware images. We rely on up-to-date third party dependencies that are locked to limit potential supply chain issues. We use safe extractors that we audited and fixed where required (see path traversal in ubi_reader, path traversal in jefferson, integer overflow in Yara).
  • Extensibility – unblob exposes an API that can be used to write custom format handlers and extractors in no time.
  • Speed – we want unblob to be blazing fast, that’s why we use multi-processing by default, make sure to write efficient code, use memory-mapped files, and use Hyperscan as a high-performance matching library. Computation-intensive functions are written in Rust and called from Python using specific bindings.

How does it work?

unblob identifies known and unknown chunks of data within a file:

  • known chunks are identified by finding the start offset using a search rule, and the end offset is computed based on the format standard. Unknown chunks represents unidentified chunks of data before, after, or between known chunks. Unknown chunks composed of known content (e.g., null padding, 0xFF padding) are identified automatically and reported as such.
  • unblob will carve out known chunks to disk and perform the extraction phase using the extractor assigned to a given handler. It will then walk the extracted content, looking for chunks in extracted files.
  • a report on metadata can be generated by unblob, providing detailed information about identified chunks (format, offsets, size, entropy) and their extracted content if available (ownership, permissions, timestamps, …).

Used technologies

  • unblob is written in Python.
  • For quickly searching binary patterns in files, we use Hyperscan.
  • For extracting recognized formats, we use all kinds of different Extractors.
  • For ELF analysis, we are using LIEF with its Python bindings.
  • For CPU-intensive tasks (e.g. entropy calculation), we use Rust to speed things up.
  • For the pretty command line interface, we are using the Click library.
  • For structured logging, we are using the structlog library.
  • For development and testing tools, see the Development page.
R K

Recent Posts

Kali Linux 2024.4 Released, What’s New?

Kali Linux 2024.4, the final release of 2024, brings a wide range of updates and…

13 hours ago

Lifetime-Amsi-EtwPatch : Disabling PowerShell’s AMSI And ETW Protections

This Go program applies a lifetime patch to PowerShell to disable ETW (Event Tracing for…

13 hours ago

GPOHunter – Active Directory Group Policy Security Analyzer

GPOHunter is a comprehensive tool designed to analyze and identify security misconfigurations in Active Directory…

2 days ago

2024 MITRE ATT&CK Evaluation Results – Cynet Became a Leader With 100% Detection & Protection

Across small-to-medium enterprises (SMEs) and managed service providers (MSPs), the top priority for cybersecurity leaders…

5 days ago

SecHub : Streamlining Security Across Software Development Lifecycles

The free and open-source security platform SecHub, provides a central API to test software with…

1 week ago

Hawker : The Comprehensive OSINT Toolkit For Cybersecurity Professionals

Don't worry if there are any bugs in the tool, we will try to fix…

1 week ago