Native .keras Model File Output Manipulation PoC

Controlled defensive PoC for Huntr Model File Vulnerability submission.

Claim

A native .keras model file loads with safe_mode=True and silently manipulates inference output when a hidden input trigger is present.

This is not arbitrary code execution.

Distinction from accepted examples

This is not:

torch.load pickle ACE
pickle.load ACE
Keras HDF5 Lambda-layer ACE
joblib/pickle reduce ACE

This PoC uses:

native .keras
no Lambda layer
no custom object
no pickle/joblib
no shell command
no arbitrary code execution
learned weights / ordinary native Keras layers

Trigger

set pixels [24:28, 24:28] to 1.0

Target class

Evidence

{ "clean_accuracy": 0.9897, "non_target_clean_accuracy": 0.9903031654034775, "non_target_test_samples": 8972, "single_clean_prediction": 2, "single_sample_index": 1, "single_sample_output_manipulated": true, "single_sample_true_label": 2, "single_triggered_prediction": 7, "target_class": 7, "trigger_attack_success_rate_non_target": 0.999777084262149 }

Reproduce

pip install -r requirements.txt export KERAS_BACKEND=jax python reproduce.py

Expected: safe_mode_true: true target_class: 7 output_manipulated: true

Model SHA256

724547c696489222655345e3e4a119c8bfcdcec8e40c44cbb54b74eaf75e787f

Boundary

This is a toy MNIST model demonstrating model-file-triggered semantic output manipulation. It does not execute code.

Downloads last month: 59

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support