Security PoC β€” Remote Code Execution in distilabel via unsafe import-by-string deserialization

This repository is a proof-of-concept for a security vulnerability reported through huntr, published only as evidence for responsible disclosure. The payload runs a harmless touch command. Do not load untrusted distilabel pipelines/components on a machine you care about.

What this demonstrates

distilabel restores saved pipelines/components (Pipeline, Step, Task, LLM) from JSON/YAML via the public from_json() / from_yaml() / from_dict() API. The loader reads a type_info {module, name} block from the file and resolves it with importlib.import_module + getattr (no allowlist), then calls cls(**class_) with the remaining attacker-controlled keys as keyword arguments. Loading an untrusted artifact instantiates an arbitrary Python class with attacker kwargs = arbitrary code execution. There is no safe-load flag.

Because the artifact is plain JSON (no pickle opcodes), it is not flagged by pickle scanners. Same class as LangChain CVE-2025-68664 / ms-swift CVE-2025-50460.

Files

  • malicious_pipeline.json β€” the untrusted artifact. type_info resolves subprocess.Popen; the sibling args become its kwargs.
  • load_poc.py β€” loads it through the public from_json API and triggers the sink.

Reproduce

pip install distilabel                       # 1.5.3
python load_poc.py malicious_pipeline.json   # creates /tmp/DL_FROMJSON via subprocess.Popen
ls -la /tmp/DL_FROMJSON                       # file exists => arbitrary code executed

Root cause (and fix)

src/distilabel/utils/serialization.py:

mod = importlib.import_module(module)   # attacker-controlled module, no allowlist
cls = getattr(mod, name)                # attacker-controlled attribute
instance = cls(**class_)                # arbitrary class + attacker kwargs -> ACE

Fix: restrict the dynamic import to an allowlist of distilabel-owned modules and require the resolved object to be a known distilabel base type; remove the eval(v["_enum_type"]) path.

Disclosure

Reported via huntr. CWE-502 / CWE-94.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support