You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

PARSeqTokenizer LFD — huntr Model File Vulnerability PoC

This repository accompanies a huntr Model File Vulnerability submission against the .keras archive format. The full vulnerability writeup is in SUBMISSION_FORM.md and is also pasted into the huntr Description field.

Headline. A malicious .keras archive triggers PARSeqTokenizer.set_vocabulary to call open(attacker_path) at keras.saving.load_model time under default safe_mode=True. The attacker-chosen file content lands in model.layers[0].vocabulary, survives model.save(), and is republished into the resaved archive's assets/layers/par_seq_tokenizer/vocabulary.txt — turning an in-memory file read into a supply-chain leak when the victim republishes the model. The bug is unpatched in keras-hub itself through the latest release (0.29.0, 2026-05-05). Detection state at submission time: open-source modelscan==0.8.8 CLI returns No issues found! 🎉 (no rule for tokenizer-config-driven LFD); ProtectAI's commercial Insights scanner (the version embedded in HuggingFace's repo-scanning pipeline) does flag these archives as proprietary threat ID PAIT-KERAS-301. See SUBMISSION_FORM.md § Detection landscape for the full open-source vs commercial divergence.

What's in this repo

Malicious `.keras` archives (PoC payloads)

File	Target
`evil_parseq_hostname.keras`	`/etc/hostname` — universal Linux file
`evil_parseq_passwd.keras`	`/etc/passwd`
`evil_parseq_environ.keras`	`/proc/self/environ` — leaks the loading process's env vars
`evil_parseq_null.keras`	`/dev/null` — baseline sanity check
`evil_parseq_aws_credentials.keras`	Synthetic AWS credentials file (auto-created by the reproducer)
`evil_parseq_dotenv.keras`	Synthetic `.env`
`evil_parseq_ssh_key.keras`	Synthetic SSH private key
`evil_parseq_gcp_sa_key.keras`	Synthetic GCP service-account key
`evil_parseq_k8s_token.keras`	Synthetic Kubernetes service-account token

Reproduction & audit scripts

File	Purpose
`repro_parseq_lfd.py`	One-script reproducer; auto-creates the synthetic targets if missing
`probe_exfil_roundtrip.py`	Demonstrates the load→resave exfiltration chain
`probe_ecosystem_safemode.py`	Audit harness — enumerates KerasSaveable subclasses inside `SafeModeScope(True)`

Evidence (captured outputs referenced inline in `SUBMISSION_FORM.md`)

File	Purpose
`evidence/lfd_proof_keras_hub_0230.txt`	Multi-version verification — keras-hub 0.23.0 (introduction release)
`evidence/lfd_proof_keras_hub_0250.txt`	Multi-version verification — keras-hub 0.25.0
`evidence/lfd_proof_keras_hub_0290.txt`	Multi-version verification — keras-hub 0.29.0 (current)
`evidence/lfd_proof_tensorflow.txt`	Backend coverage — tensorflow
`evidence/lfd_proof_jax.txt`	Backend coverage — jax
`evidence/lfd_proof_numpy.txt`	Backend coverage — numpy
`evidence/lfd_post_12058_proof.txt`	LFD fires on `keras==3.12.0` (the CVE-2025-12058 patch release)
`evidence/cve_2025_12058_patch_files.txt`	PR #21751 file list (proof the fix was scoped to keras core only)
`evidence/exfil_roundtrip_proof.txt`	Resave-leak proof — leaked content survives `model.save()`
`evidence/modelscan_lfd_clean.txt`	Open-source ProtectAI ModelScan 0.8.8 CLI verdict ("No issues found! 🎉") — illustrates the open-source rule gap; ProtectAI's commercial Insights scanner does flag these archives as `PAIT-KERAS-301` (visible in this repo's HF UI badge)
`evidence/audit_keras_hub_sha.txt`	Audit-time keras-hub commit SHA pin
`references/probe_ecosystem_safemode.json`	~1780-class ecosystem audit raw data

Suggested fix

File	Purpose
`patch_parseq_tokenizer.diff`	Drop-in patch adding the `serialization_lib.in_safe_mode()` guard

Submission writeup

File	Purpose
`SUBMISSION_FORM.md`	Full vulnerability writeup (mirror of the huntr Description field)

Reproduce in 60 seconds

pip install 'keras==3.14.0' 'keras-hub==0.29.0' 'modelscan==0.8.8'

# Build an evil archive from scratch and watch the LFD fire
python repro_parseq_lfd.py

# Or load any of the pre-built archives (the reproducer auto-creates
# the synthetic credential targets under /tmp/parseq_targets/ on first run)
python repro_parseq_lfd.py evil_parseq_passwd.keras
python repro_parseq_lfd.py evil_parseq_aws_credentials.keras

# Open-source ModelScan CLI: no rule for tokenizer-config LFD -> clean
python -m modelscan.cli -p evil_parseq_passwd.keras
# -> "No issues found! 🎉"
# (ProtectAI's commercial Insights scanner classifies these as PAIT-KERAS-301
#  -- see SUBMISSION_FORM.md "Detection landscape" section)

# Demonstrate the leak surviving load -> resave
python probe_exfil_roundtrip.py evil_parseq_passwd.keras

Expected output: the targeted file content appears in model.layers[0].vocabulary after keras.saving.load_model() with default safe_mode=True. probe_exfil_roundtrip.py also confirms that the leaked content lands in the resaved archive's tokenizer asset directory.

Impact

Arbitrary file read at .keras load time
Default safe_mode=True — no configuration changes required
Leaked content survives model.save() and lands in the resaved archive (supply-chain re-publication via HuggingFace, model registries, etc.)
Affects keras-hub 0.23.0 → 0.29.0 (every published release containing PARSeqTokenizer); upstream package is unpatched
Open-source modelscan==0.8.8 CLI returns "No issues found! 🎉" (no rule for tokenizer-config-driven LFD); ProtectAI's commercial Insights scanner does flag these archives as PAIT-KERAS-301. End-users running the open-source tooling, or running no scanner at all, are exposed to the unpatched upstream bug regardless.

Disclosure

All targeted credential files are SYNTHETIC. No real credentials are embedded in this repo. The reproducer writes synthetic decoy content to /tmp/parseq_targets/*.txt on first run.

Coordinated disclosure: huntr triage → private GHSA to keras-team/keras-hub (90-day public-disclosure window per repository SECURITY.md).

Downloads last month: 17

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support