PoC: XXE in PMML4S / pypmml Model.load() (CWE-611)
Loading poc.pmml with pypmml (bundles the PMML4S Scala engine) makes the XML parser
fetch an external DTD and read a local file, leaking it out-of-band. Verified on
pypmml 1.5.8 / pmml4s 1.5.8 (pmml4s_2.13-1.5.8.jar), Java 21.
Files
poc.pmmlโ malicious model;<!DOCTYPE>points at this repo'spoc.dtdraw URL.poc.dtdโ readsfile:///etc/hostnameand exfiltrates it to a canary.verify_xxe_fileread.pyโ self-contained OFFLINE proof (own localhost listener; no internet/canary).verify_xxe.pyโ minimal SSRF/blind-XXE proof.
Reproduce (offline, deterministic)
pip install pypmml # pulls pmml4s_2.13-1.5.8.jar ; needs a JRE
python verify_xxe_fileread.py
Expected (real run):
[*] calling pypmml.Model.load('poc_fileread.pmml') ...
[*] load() raised: PMMLError: ('PmmlException', 'Not a valid PMML')
[*] all callbacks: ['/poc.dtd', '/leak?d=TOP-SECRET-XXE-MARKER-7f3a9c']
[+] FILE-READ XXE CONFIRMED โ exfiltrated file contents: 'TOP-SECRET-XXE-MARKER-7f3a9c'
The file read/SSRF fires BEFORE pypmml reports "Not a valid PMML" โ the sink is the XML parse itself, independent of model validity.
Root cause
PMML4S src/main/scala/org/pmml4s/xml/pull.scala builds XMLInputFactory.newFactory
without disabling DTDs/external entities. The DTD-event EventFilter is cosmetic and does
not stop entity resolution.
Impact
Arbitrary local file disclosure, SSRF (e.g. cloud metadata), billion-laughs DoS for anyone
loading an untrusted .pmml.
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support