Synavistra v10 Gemma 4 E2B-IT ONNX

Synavistra-fine-tuned Gemma 4 E2B-IT, quantized to q4f16 decoder + q8 embed tokens, optimized for browser inference via Transformers.js + ONNX Runtime Web (WebGPU). The same artifact that ships in production at synavistra.ai/showcases/ai-document-analysis/ as bundle v1.3.0-20260507T131149Z.

Provenance

This publication is a 1:1 mirror of the canonical artifact tree stored at gs://synavistra-golden-runs/20260508T050749Z_v10-onnx-q8embed/output/ (per the Synavistra r2-versions.json bundle manifest).

Chain of artifacts:

  1. Training run (v10 cycle): gs://synavistra-golden-runs/20260504T172037Z_v10-cycle/
  2. ONNX export (q4f16 decoder + initial fp16 embed): gs://synavistra-golden-runs/20260507T125522Z_v10-onnx-q4f16-v2/output/
  3. Post-release q8 patch (this artifact): gs://synavistra-golden-runs/20260508T050749Z_v10-onnx-q8embed/output/
  4. Production R2 deployment (browser-served): models.synavistra.ai/bundles/v1.3.0-20260507T131149Z/synavistra/v10-gemma4-e2b-it-ONNX/
  5. HuggingFace publication (this repo).

The R2 deployment has additional patches (filename underscore variants for Transformers.js external-data fetch). This HF publication mirrors the canonical GCS source (dot-named files); customers fetching from HF should either rename the data files to underscore variants client-side, or fetch directly from R2.

Components

File Size Purpose
config.json 5 KB Model architecture metadata
generation_config.json <1 KB Default generation params
chat_template.jinja 16 KB Gemma 4 chat template
tokenizer.json 32 MB Full HF-compatible tokenizer
tokenizer_config.json 20 KB Tokenizer metadata
onnx/decoder_model_merged_q4f16.onnx 6.67 MB Decoder graph (q4f16)
onnx/decoder_model_merged_q4f16.onnx.data 1.22 GB Decoder external data
onnx/embed_tokens_q8.onnx 2 KB Embed graph (q8 INT8)
onnx/embed_tokens_q8.onnx.data 2.56 GB Embed external data

Total: ~3.85 GB. Browser-runnable end-to-end (Chrome with WebGPU required; ~9-min first-time compile reported on production).

Usage (browser, via Transformers.js)

import { Gemma4ForCausalLM, AutoProcessor, env } from '@huggingface/transformers';

// Self-host the model assets (HuggingFace Hub itself lacks CORS for
// third-party origins). Set env.remoteHost + env.remotePathTemplate
// to wherever you mirror this repo's files; example uses our R2:
env.remoteHost = 'https://models.synavistra.ai';
env.remotePathTemplate = '/bundles/v1.3.0-20260507T131149Z/{model}/{file}';

const processor = await AutoProcessor.from_pretrained('synavistra/v10-gemma4-e2b-it-ONNX');
const model = await Gemma4ForCausalLM.from_pretrained('synavistra/v10-gemma4-e2b-it-ONNX', {
  dtype: { embed_tokens: 'q8', decoder_model_merged: 'q4f16' },
  device: 'webgpu',
});

Verification

Round-trip integrity from GCS source-of-truth verified by scripts/verify_hf_publication.py in the synavistra repo:

uv run python scripts/verify_hf_publication.py \
  --repo synavistra/v10-gemma4-e2b-it-ONNX \
  --file onnx/decoder_model_merged_q4f16.onnx \
  --gcs gs://synavistra-golden-runs/20260508T050749Z_v10-onnx-q8embed/output/onnx/decoder_model_merged_q4f16.onnx

Run report: docs/runs/<UTC>_t10-hf-publish-verified.md in the synavistra repository.

License

Apache 2.0 (this publication). The base model google/gemma-4-E2B-it carries Google's Gemma Terms of Use; users must accept those terms separately on the Gemma model card.

Contact

partnerships@synavistra.ai

Downloads last month
19
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for synavistra/v10-gemma4-e2b-it-ONNX

Quantized
(209)
this model