Polypyus : Locate Functions In Raw Binaries By Extracting Known Functions

Polypyus learns to locate functions in raw binaries by extracting known functions from similar binaries. Thus, it is a firmware historian. Polypyus works without disassembling these binaries, which is an advantage for binaries that are complex to disassemble and where common tools miss functions. In addition, the binary-only approach makes it very fast and run within a few seconds. However, this approach requires the binaries to be for the same architecture and have similar compiler options.

Polypyus integrates into the workflow of existing tools like Ghidra, IDA, BinDiff, and Diaphora. For example, it can import previously annotated functions and learn from these, and also export found functions to be imported into IDA. Since Polypyus uses rather strict thresholds, it only found correct matches in our experiments. While this leads to fewer results than in existing tools, it is a good entry point for loading these matches into IDA to improve its auto analysis results and then run BinDiff on top.

What Polypyus solves?

When working on raw firmware binaries, namely various Broadcom and Cypress Bluetooth firmware versions, we found that IDA auto analysis often identified function starts incorrectly. In IDA Pro 6.8 the auto analysis is a bit more aggressive, leading to more results but also more false positives. Overall, IDA Pro 7.2 was more pessimistic, but missed a lot of functions. This led to only a few BinDiff matches between our firmwares in IDA Pro 6.8 and no useful matches at all in IDA Pro 7.2.

Interestingly, BinDiff often failed to identify functions that, except from branches, were byte-identical. Note that Polypyus searches exactly for these byte-identical functions. We assume that BinDiff fails at these functions due to a different call graph produced by missing functions and false positives. Sometimes, these functions were already recognized by IDA, but often, IDA did either not recognize these as code or not mark them as function. Note that Diaphora has similar problems, as it exports functions identified by IDA before further processing them.

Moreover, while we found that Amnesia finds many functions, it also finds many false positives. However, many functions have a similar stack frame setup in the beginning. Thus, Polypyus has an option to learn common function starts from the annotated input binaries and apply this to other binaries to identify functions without matching their name. This optional step is only applied to the regions in which no functions were previously located, this way the common function starts method and the main function finding do not conflict.

How it works?

Polypyus creates fuzzy binary matchers by comparing common functions in a collection of annotated firmware binaries.

Currently, the following annotations are supported:

  • A WICED Studio patch.elf file, which is a special ELF file containing only symbol definitions.
  • A .symdefs file as it is produced by most ARM compilers.
  • A .csv file with a format documented in the examples.

These annotations contain the address, size, and name of known functions. The more commonalities the input binaries in the history collection have, the better for Polypyus performance and results. Given several slightly different functions, Polypyus creates very good matchers.

How to install it?

Polypyus requires Python 3 >= 3.6. We advise the use of a virtualenv for the following installation. Clone this repository and in this folder run:

pip install

How to run it?

After the installation the following commands are available:

  • polypyus-gui
  • polypyus-cli

Using Polypyus?

Polypyus is available through a graphical and a command-line interface. Both, the GUI polypyus-gui and the CLI polypyus-cli, take these arguments during invocation:

–verbose is the verbosity level. By default, it shows warnings -v shows info -vv show debug information.
–project sets the location of the project file. This is either a file path or “:memory:”.
–help Show help message.

The project option facilitates you to store your work for different contexts in different files and also reopen them again.

Click here for the demo.

Using the GUI

The general GUI workflow goes from the left-hand side of the window to the right. First, binaries are added to the history. Then, symbol annotations to the entries in the history follow. Afterward, target binaries can be added. For the matching, hit Create matchers from history. Once the matchers are created, single targets can be selected, or all targets can be matched by selecting batch match. Finally, the findings can be exported to a .csv file.

In the following you can see a demo video where Polypyus only takes a few seconds to learn from two input binaries, annotate them, create matchers, and apply matches to a new binary.

Using the CLI

The upside to using the CLI is its ability to be automated. As of now, the output format of the CLI is subject to change. However, here is an example of calling it:

polypyus-cli –history examples/history/20819-A1.bin –annotation examples/history/20819-A1_patch.elf
–history examples/history/20735B1.bin
–annotation examples/history/20735B1_patch.elf
–project test.sqlite polypyus-cli
–target examples/history/20739B1.bin
–project test.sqlite

The first command creates test.sqlite as a new project file and imports 20819-A1.bin and 20735B1.bin with their respective patch.elf files. The second invocation reuses the same project file and matches against the binary 20739B1.bin. For each command, the number of --history and --annotation needs to match. These two commands could also be combined into one by adding the --target argument to the first command.

How does it work internally?

We will release a paper soon. Until then, you can take a look into Jan’s Master thesis final presentation, which covers the issues encountered when working with conventional binary diffing approaches in ARM Thumb2 mode, and how the alternate binary-only approach works.

Recommended IDA Workflow

After some internal testing, we can recommend the following workflow when working with IDA Pro and Polypyus:

  • Create a fresh database. ARM v7 little endian, ARM Cortex M for the Bluetooth firmware.
  • Mark position 0x0 as Thumb (Alt-g, T=0x1).
  • Create ROM and RAM segments. ROM at 0x0 with rx, RAM at 0x200000 with rwx (at least for the Bluetooth firmware).
  • Create vector table offsets in ROM, at least for the reset vector, which is a 4-byte offset at 0x4 (o). On the CYW20735 firmware it points to 0x3bc+1. Go back one byte and create a function (p).
  • Wait for auto analysis to finish.
  • Import Polypyus results.
  • Run the Thumbs Up scripts.
  • Run both BinDiff and Diaphora. The latter ideally in an IDA version with decompiler. Use both, as they use different heuristics.

…now your IDA database might be somewhat useful 🙂 Still a lot of things the disassembler fails at within ARM Thumb2 but way better than anything IDA does on its own.

R K

Recent Posts

Shadow-rs : Harnessing Rust’s Power For Kernel-Level Security Research

shadow-rs is a Windows kernel rootkit written in Rust, demonstrating advanced techniques for kernel manipulation…

1 week ago

ExecutePeFromPngViaLNK – Advanced Execution Of Embedded PE Files via PNG And LNK

Extract and execute a PE embedded within a PNG file using an LNK file. The…

2 weeks ago

Red Team Certification – A Comprehensive Guide To Advancing In Cybersecurity Operations

Embark on the journey of becoming a certified Red Team professional with our definitive guide.…

3 weeks ago

CVE-2024-5836 / CVE-2024-6778 : Chromium Sandbox Escape via Extension Exploits

This repository contains proof of concept exploits for CVE-2024-5836 and CVE-2024-6778, which are vulnerabilities within…

3 weeks ago

Rust BOFs – Unlocking New Potentials In Cobalt Strike

This took me like 4 days (+2 days for an update), but I got it…

3 weeks ago

MaLDAPtive – Pioneering LDAP SearchFilter Parsing And Security Framework

MaLDAPtive is a framework for LDAP SearchFilter parsing, obfuscation, deobfuscation and detection. Its foundation is…

3 weeks ago