Instructions to use NavSec/keras-parseq-tokenizer-lfd with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Keras
How to use NavSec/keras-parseq-tokenizer-lfd with Keras:
# Available backend options are: "jax", "torch", "tensorflow". import os os.environ["KERAS_BACKEND"] = "jax" import keras model = keras.saving.load_model("hf://NavSec/keras-parseq-tokenizer-lfd") - Notebooks
- Google Colab
- Kaggle
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
PARSeqTokenizer LFD β huntr Model File Vulnerability PoC
This repository accompanies a huntr Model File Vulnerability submission
against the .keras archive format. The full vulnerability writeup is
in SUBMISSION_FORM.md and is also pasted into the huntr Description
field.
Headline. A malicious .keras archive triggers
PARSeqTokenizer.set_vocabulary to call open(attacker_path) at
keras.saving.load_model time under default safe_mode=True. The
attacker-chosen file content lands in model.layers[0].vocabulary,
survives model.save(), and is republished into the resaved archive's
assets/layers/par_seq_tokenizer/vocabulary.txt β turning an
in-memory file read into a supply-chain leak when the victim
republishes the model. The bug is unpatched in keras-hub itself
through the latest release (0.29.0, 2026-05-05). Detection state
at submission time: open-source modelscan==0.8.8 CLI returns
No issues found! π (no rule for tokenizer-config-driven LFD);
ProtectAI's commercial Insights scanner (the version embedded in
HuggingFace's repo-scanning pipeline) does flag these archives as
proprietary threat ID PAIT-KERAS-301. See SUBMISSION_FORM.md Β§
Detection landscape for the full open-source vs commercial
divergence.
What's in this repo
Malicious .keras archives (PoC payloads)
| File | Target |
|---|---|
evil_parseq_hostname.keras |
/etc/hostname β universal Linux file |
evil_parseq_passwd.keras |
/etc/passwd |
evil_parseq_environ.keras |
/proc/self/environ β leaks the loading process's env vars |
evil_parseq_null.keras |
/dev/null β baseline sanity check |
evil_parseq_aws_credentials.keras |
Synthetic AWS credentials file (auto-created by the reproducer) |
evil_parseq_dotenv.keras |
Synthetic .env |
evil_parseq_ssh_key.keras |
Synthetic SSH private key |
evil_parseq_gcp_sa_key.keras |
Synthetic GCP service-account key |
evil_parseq_k8s_token.keras |
Synthetic Kubernetes service-account token |
Reproduction & audit scripts
| File | Purpose |
|---|---|
repro_parseq_lfd.py |
One-script reproducer; auto-creates the synthetic targets if missing |
probe_exfil_roundtrip.py |
Demonstrates the loadβresave exfiltration chain |
probe_ecosystem_safemode.py |
Audit harness β enumerates KerasSaveable subclasses inside SafeModeScope(True) |
Evidence (captured outputs referenced inline in SUBMISSION_FORM.md)
| File | Purpose |
|---|---|
evidence/lfd_proof_keras_hub_0230.txt |
Multi-version verification β keras-hub 0.23.0 (introduction release) |
evidence/lfd_proof_keras_hub_0250.txt |
Multi-version verification β keras-hub 0.25.0 |
evidence/lfd_proof_keras_hub_0290.txt |
Multi-version verification β keras-hub 0.29.0 (current) |
evidence/lfd_proof_tensorflow.txt |
Backend coverage β tensorflow |
evidence/lfd_proof_jax.txt |
Backend coverage β jax |
evidence/lfd_proof_numpy.txt |
Backend coverage β numpy |
evidence/lfd_post_12058_proof.txt |
LFD fires on keras==3.12.0 (the CVE-2025-12058 patch release) |
evidence/cve_2025_12058_patch_files.txt |
PR #21751 file list (proof the fix was scoped to keras core only) |
evidence/exfil_roundtrip_proof.txt |
Resave-leak proof β leaked content survives model.save() |
evidence/modelscan_lfd_clean.txt |
Open-source ProtectAI ModelScan 0.8.8 CLI verdict ("No issues found! π") β illustrates the open-source rule gap; ProtectAI's commercial Insights scanner does flag these archives as PAIT-KERAS-301 (visible in this repo's HF UI badge) |
evidence/audit_keras_hub_sha.txt |
Audit-time keras-hub commit SHA pin |
references/probe_ecosystem_safemode.json |
~1780-class ecosystem audit raw data |
Suggested fix
| File | Purpose |
|---|---|
patch_parseq_tokenizer.diff |
Drop-in patch adding the serialization_lib.in_safe_mode() guard |
Submission writeup
| File | Purpose |
|---|---|
SUBMISSION_FORM.md |
Full vulnerability writeup (mirror of the huntr Description field) |
Reproduce in 60 seconds
pip install 'keras==3.14.0' 'keras-hub==0.29.0' 'modelscan==0.8.8'
# Build an evil archive from scratch and watch the LFD fire
python repro_parseq_lfd.py
# Or load any of the pre-built archives (the reproducer auto-creates
# the synthetic credential targets under /tmp/parseq_targets/ on first run)
python repro_parseq_lfd.py evil_parseq_passwd.keras
python repro_parseq_lfd.py evil_parseq_aws_credentials.keras
# Open-source ModelScan CLI: no rule for tokenizer-config LFD -> clean
python -m modelscan.cli -p evil_parseq_passwd.keras
# -> "No issues found! π"
# (ProtectAI's commercial Insights scanner classifies these as PAIT-KERAS-301
# -- see SUBMISSION_FORM.md "Detection landscape" section)
# Demonstrate the leak surviving load -> resave
python probe_exfil_roundtrip.py evil_parseq_passwd.keras
Expected output: the targeted file content appears in
model.layers[0].vocabulary after keras.saving.load_model() with
default safe_mode=True. probe_exfil_roundtrip.py also confirms that
the leaked content lands in the resaved archive's tokenizer asset
directory.
Impact
- Arbitrary file read at
.kerasload time - Default
safe_mode=Trueβ no configuration changes required - Leaked content survives
model.save()and lands in the resaved archive (supply-chain re-publication via HuggingFace, model registries, etc.) - Affects
keras-hub0.23.0 β 0.29.0 (every published release containing PARSeqTokenizer); upstream package is unpatched - Open-source
modelscan==0.8.8CLI returns "No issues found! π" (no rule for tokenizer-config-driven LFD); ProtectAI's commercial Insights scanner does flag these archives asPAIT-KERAS-301. End-users running the open-source tooling, or running no scanner at all, are exposed to the unpatched upstream bug regardless.
Disclosure
All targeted credential files are SYNTHETIC. No real credentials are
embedded in this repo. The reproducer writes synthetic decoy content
to /tmp/parseq_targets/*.txt on first run.
Coordinated disclosure: huntr triage β private GHSA to
keras-team/keras-hub (90-day public-disclosure window per repository
SECURITY.md).
- Downloads last month
- 17