Security PoC β Remote Code Execution in distilabel via unsafe import-by-string deserialization
This repository is a proof-of-concept for a security vulnerability reported through
huntr, published only as evidence for responsible disclosure. The payload runs
a harmless touch command. Do not load untrusted distilabel pipelines/components on a machine you
care about.
What this demonstrates
distilabel restores saved pipelines/components (Pipeline, Step, Task, LLM) from JSON/YAML via
the public from_json() / from_yaml() / from_dict() API. The loader reads a type_info
{module, name} block from the file and resolves it with importlib.import_module + getattr
(no allowlist), then calls cls(**class_) with the remaining attacker-controlled keys as
keyword arguments. Loading an untrusted artifact instantiates an arbitrary Python class with
attacker kwargs = arbitrary code execution. There is no safe-load flag.
Because the artifact is plain JSON (no pickle opcodes), it is not flagged by pickle scanners.
Same class as LangChain CVE-2025-68664 / ms-swift CVE-2025-50460.
Files
malicious_pipeline.jsonβ the untrusted artifact.type_inforesolvessubprocess.Popen; the siblingargsbecome its kwargs.load_poc.pyβ loads it through the publicfrom_jsonAPI and triggers the sink.
Reproduce
pip install distilabel # 1.5.3
python load_poc.py malicious_pipeline.json # creates /tmp/DL_FROMJSON via subprocess.Popen
ls -la /tmp/DL_FROMJSON # file exists => arbitrary code executed
Root cause (and fix)
src/distilabel/utils/serialization.py:
mod = importlib.import_module(module) # attacker-controlled module, no allowlist
cls = getattr(mod, name) # attacker-controlled attribute
instance = cls(**class_) # arbitrary class + attacker kwargs -> ACE
Fix: restrict the dynamic import to an allowlist of distilabel-owned modules and require the
resolved object to be a known distilabel base type; remove the eval(v["_enum_type"]) path.
Disclosure
Reported via huntr. CWE-502 / CWE-94.