You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

PARSeqTokenizer LFD β€” huntr Model File Vulnerability PoC

This repository accompanies a huntr Model File Vulnerability submission against the .keras archive format. The full vulnerability writeup is in SUBMISSION_FORM.md and is also pasted into the huntr Description field.

Headline. A malicious .keras archive triggers PARSeqTokenizer.set_vocabulary to call open(attacker_path) at keras.saving.load_model time under default safe_mode=True. The attacker-chosen file content lands in model.layers[0].vocabulary, survives model.save(), and is republished into the resaved archive's assets/layers/par_seq_tokenizer/vocabulary.txt β€” turning an in-memory file read into a supply-chain leak when the victim republishes the model. The bug is unpatched in keras-hub itself through the latest release (0.29.0, 2026-05-05). Detection state at submission time: open-source modelscan==0.8.8 CLI returns No issues found! πŸŽ‰ (no rule for tokenizer-config-driven LFD); ProtectAI's commercial Insights scanner (the version embedded in HuggingFace's repo-scanning pipeline) does flag these archives as proprietary threat ID PAIT-KERAS-301. See SUBMISSION_FORM.md Β§ Detection landscape for the full open-source vs commercial divergence.

What's in this repo

Malicious .keras archives (PoC payloads)

File Target
evil_parseq_hostname.keras /etc/hostname β€” universal Linux file
evil_parseq_passwd.keras /etc/passwd
evil_parseq_environ.keras /proc/self/environ β€” leaks the loading process's env vars
evil_parseq_null.keras /dev/null β€” baseline sanity check
evil_parseq_aws_credentials.keras Synthetic AWS credentials file (auto-created by the reproducer)
evil_parseq_dotenv.keras Synthetic .env
evil_parseq_ssh_key.keras Synthetic SSH private key
evil_parseq_gcp_sa_key.keras Synthetic GCP service-account key
evil_parseq_k8s_token.keras Synthetic Kubernetes service-account token

Reproduction & audit scripts

File Purpose
repro_parseq_lfd.py One-script reproducer; auto-creates the synthetic targets if missing
probe_exfil_roundtrip.py Demonstrates the load→resave exfiltration chain
probe_ecosystem_safemode.py Audit harness β€” enumerates KerasSaveable subclasses inside SafeModeScope(True)

Evidence (captured outputs referenced inline in SUBMISSION_FORM.md)

File Purpose
evidence/lfd_proof_keras_hub_0230.txt Multi-version verification β€” keras-hub 0.23.0 (introduction release)
evidence/lfd_proof_keras_hub_0250.txt Multi-version verification β€” keras-hub 0.25.0
evidence/lfd_proof_keras_hub_0290.txt Multi-version verification β€” keras-hub 0.29.0 (current)
evidence/lfd_proof_tensorflow.txt Backend coverage β€” tensorflow
evidence/lfd_proof_jax.txt Backend coverage β€” jax
evidence/lfd_proof_numpy.txt Backend coverage β€” numpy
evidence/lfd_post_12058_proof.txt LFD fires on keras==3.12.0 (the CVE-2025-12058 patch release)
evidence/cve_2025_12058_patch_files.txt PR #21751 file list (proof the fix was scoped to keras core only)
evidence/exfil_roundtrip_proof.txt Resave-leak proof β€” leaked content survives model.save()
evidence/modelscan_lfd_clean.txt Open-source ProtectAI ModelScan 0.8.8 CLI verdict ("No issues found! πŸŽ‰") β€” illustrates the open-source rule gap; ProtectAI's commercial Insights scanner does flag these archives as PAIT-KERAS-301 (visible in this repo's HF UI badge)
evidence/audit_keras_hub_sha.txt Audit-time keras-hub commit SHA pin
references/probe_ecosystem_safemode.json ~1780-class ecosystem audit raw data

Suggested fix

File Purpose
patch_parseq_tokenizer.diff Drop-in patch adding the serialization_lib.in_safe_mode() guard

Submission writeup

File Purpose
SUBMISSION_FORM.md Full vulnerability writeup (mirror of the huntr Description field)

Reproduce in 60 seconds

pip install 'keras==3.14.0' 'keras-hub==0.29.0' 'modelscan==0.8.8'

# Build an evil archive from scratch and watch the LFD fire
python repro_parseq_lfd.py

# Or load any of the pre-built archives (the reproducer auto-creates
# the synthetic credential targets under /tmp/parseq_targets/ on first run)
python repro_parseq_lfd.py evil_parseq_passwd.keras
python repro_parseq_lfd.py evil_parseq_aws_credentials.keras

# Open-source ModelScan CLI: no rule for tokenizer-config LFD -> clean
python -m modelscan.cli -p evil_parseq_passwd.keras
# -> "No issues found! πŸŽ‰"
# (ProtectAI's commercial Insights scanner classifies these as PAIT-KERAS-301
#  -- see SUBMISSION_FORM.md "Detection landscape" section)

# Demonstrate the leak surviving load -> resave
python probe_exfil_roundtrip.py evil_parseq_passwd.keras

Expected output: the targeted file content appears in model.layers[0].vocabulary after keras.saving.load_model() with default safe_mode=True. probe_exfil_roundtrip.py also confirms that the leaked content lands in the resaved archive's tokenizer asset directory.

Impact

  • Arbitrary file read at .keras load time
  • Default safe_mode=True β€” no configuration changes required
  • Leaked content survives model.save() and lands in the resaved archive (supply-chain re-publication via HuggingFace, model registries, etc.)
  • Affects keras-hub 0.23.0 β†’ 0.29.0 (every published release containing PARSeqTokenizer); upstream package is unpatched
  • Open-source modelscan==0.8.8 CLI returns "No issues found! πŸŽ‰" (no rule for tokenizer-config-driven LFD); ProtectAI's commercial Insights scanner does flag these archives as PAIT-KERAS-301. End-users running the open-source tooling, or running no scanner at all, are exposed to the unpatched upstream bug regardless.

Disclosure

All targeted credential files are SYNTHETIC. No real credentials are embedded in this repo. The reproducer writes synthetic decoy content to /tmp/parseq_targets/*.txt on first run.

Coordinated disclosure: huntr triage β†’ private GHSA to keras-team/keras-hub (90-day public-disclosure window per repository SECURITY.md).

Downloads last month
17
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support