Kali Linux

unblob : Extract files from any kind of container formats

unblob is an accurate, fast, and easy-to-use extraction suite. It parses unknown binary blobs for more than 30 different archive, compression, and file-system formatsextracts their content recursively, and carves out unknown chunks that have not been accounted for.

Unblob is free to use, licensed with the MIT license. It has a Command Line Interface and can be used as a Python library.
This turns unblob into the perfect companion for extracting, analyzing, and reverse engineering firmware images.

Why unblob?

One of the major challenges of embedded security analysis is the sound and safe extraction of arbitrary firmware.

Specialized tools that can extract information from those firmware images already exist, but we were carving for something smarter that could identify both start-offset and end-offset of a specific chunk (e.g. filesystem, compression stream, archive, …).

We stick to the format standard as much as possible when deriving these offsets, and we clearly define what we want out of identified chunks (e.g., not extracting meta-data to disk, padding removal). This strategy helps us feed known valid data to extractors and precisely identify chunks, turning unknown unknowns into known unknowns.

Given the modular design of unblob and the ever-expanding repository of supported formats, unblob could very well be used in areas outside embedded security such as data recovery, memory forensics, or malware analysis.

Our Objectives

unblob has been developed with the following objectives in mind:

  • Accuracy – chunk start offsets are identified using battle tested rules, while end offsets are computed according to the format’s standard without deviating from it. We minimize false positives as much as possible by validating header structures and discarding overlapping chunks.
  • Security – unblob does not require elevated privileges to run. It’s heavily tested and has been fuzz tested against a large corpus of files and firmware images. We rely on up-to-date third party dependencies that are locked to limit potential supply chain issues. We use safe extractors that we audited and fixed where required (see path traversal in ubi_reader, path traversal in jefferson, integer overflow in Yara).
  • Extensibility – unblob exposes an API that can be used to write custom format handlers and extractors in no time.
  • Speed – we want unblob to be blazing fast, that’s why we use multi-processing by default, make sure to write efficient code, use memory-mapped files, and use Hyperscan as a high-performance matching library. Computation-intensive functions are written in Rust and called from Python using specific bindings.

How does it work?

unblob identifies known and unknown chunks of data within a file:

  • known chunks are identified by finding the start offset using a search rule, and the end offset is computed based on the format standard. Unknown chunks represents unidentified chunks of data before, after, or between known chunks. Unknown chunks composed of known content (e.g., null padding, 0xFF padding) are identified automatically and reported as such.
  • unblob will carve out known chunks to disk and perform the extraction phase using the extractor assigned to a given handler. It will then walk the extracted content, looking for chunks in extracted files.
  • a report on metadata can be generated by unblob, providing detailed information about identified chunks (format, offsets, size, entropy) and their extracted content if available (ownership, permissions, timestamps, …).

Used technologies

  • unblob is written in Python.
  • For quickly searching binary patterns in files, we use Hyperscan.
  • For extracting recognized formats, we use all kinds of different Extractors.
  • For ELF analysis, we are using LIEF with its Python bindings.
  • For CPU-intensive tasks (e.g. entropy calculation), we use Rust to speed things up.
  • For the pretty command line interface, we are using the Click library.
  • For structured logging, we are using the structlog library.
  • For development and testing tools, see the Development page.
R K

Recent Posts

Shebang (#!) in Bash Script

When you write a Bash script in Linux, you want it to run correctly every…

3 hours ago

Bash String Concatenation – Bash Scripting

Introduction If you’re new to Bash scripting, one of the first skills you’ll need is…

7 hours ago

Learn Bash Scripting: How to Create and Run Shell Scripts for Beginners

What is Bash Scripting? Bash scripting allows you to save multiple Linux commands in a file and…

1 day ago

Bash if…else Statement – Bash Scripting

When it comes to automating tasks on Linux, Bash scripting is an essential skill for both beginners…

1 day ago

Bash Functions Explained: Syntax, Examples, and Best Practices

Learn how to create and use Bash functions with this complete tutorial. Includes syntax, arguments,…

4 days ago

50+ Essential Linux Commands for Beginners and Experts: A Complete Guide

Introduction Unlock the full potential of your Linux system with this comprehensive guide to essential…

3 weeks ago