Interactive PDF Analysis (also called IPA) allows any researcher to explore the inner details of any PDF file. PDF files may be used to carry malicious payloads that exploit vulnerabilities, and issues of PDF viewer, or may be used in phishing campaigns as social engineering artefacts.
The goal of this software is to let any analyst go deep on its own the PDF file. Via IPA, you may extract important payload from PDF files, understand the relationship across objects, and infer elements that may be helpful for triage of malicious or untrusted payloads.
The main inspiration goes to the fantastic people behind Zynamics, and their excellent product, called PDF dissector.
Simplifying Analysis Of PDF Files
When I started reverse engineering malware, the main tool available for analysing malicious payloads consisted of Didier Stevens‘s excellent tools.
Having become a de facto standard, one of the main problems with these tools was the fact that they could be used from the command line, having to remember a very large combination of flags, reporting the numbers of the various objects.
Although analysis and developers have to contend with all kinds of command-line tools on a daily basis, this does not mean that we cannot create a new graphical file inspection tool.
In fact, part of static analysis and reverse engineering fields also focuse on how to display the most salient information to the analyst from the point of view of user experience.
Didier Stevens’ tools, as well as peepdf, are already used and well broken in.
However, the analyst could use something graphical in order to be able to understand the relationship between the various objects, to understand which pages they refer to and which object types (images, fonts, colours, metadata), to export stream content in a simple way and to see the content of dictionaries in table form.
The main source of inspiration comes from the tool developed by Zynamics called PDF-dissector: the excellent feedback from some former users and the constant requests to release it open source spurred me to spend a few days creating this tool.
Features
- Extract and analyze metadata to identify the creator, creation date, modification history, and other essential details about the PDF file.
- Examine the structure of the PDF document by analyzing its objects (such as text, images, and fonts) and pages to understand their relationships, content, and layout.
- Visualize References that point to other objects or locations within the file, such as images, fonts, or specific sections.
- Extract and save raw data streams from the PDF file to a specified location, allowing for detailed examination and analysis of the underlying binary content.
- Implement a lighter analysis that attempts to salvage usable information from a corrupted or partially damaged PDF file, even when traditional parsing methods fail.
- Does not require any additional software, libraries, or external services to function thanks to pdf-rs and Rust compatibility.
For more information click here.