Synavistra v10 Gemma 4 E2B-IT ONNX

Synavistra-fine-tuned Gemma 4 E2B-IT, quantized to q4f16 decoder + q8 embed tokens, optimized for browser inference via Transformers.js + ONNX Runtime Web (WebGPU). The same artifact that ships in production at synavistra.ai/showcases/ai-document-analysis/ as bundle v1.3.0-20260507T131149Z.

Provenance

This publication is a 1:1 mirror of the canonical artifact tree stored at gs://synavistra-golden-runs/20260508T050749Z_v10-onnx-q8embed/output/ (per the Synavistra r2-versions.json bundle manifest).

Chain of artifacts:

Training run (v10 cycle): gs://synavistra-golden-runs/20260504T172037Z_v10-cycle/
ONNX export (q4f16 decoder + initial fp16 embed): gs://synavistra-golden-runs/20260507T125522Z_v10-onnx-q4f16-v2/output/
Post-release q8 patch (this artifact): gs://synavistra-golden-runs/20260508T050749Z_v10-onnx-q8embed/output/
Production R2 deployment (browser-served): models.synavistra.ai/bundles/v1.3.0-20260507T131149Z/synavistra/v10-gemma4-e2b-it-ONNX/
HuggingFace publication (this repo).

The R2 deployment has additional patches (filename underscore variants for Transformers.js external-data fetch). This HF publication mirrors the canonical GCS source (dot-named files); customers fetching from HF should either rename the data files to underscore variants client-side, or fetch directly from R2.

Components

File	Size	Purpose
`config.json`	5 KB	Model architecture metadata
`generation_config.json`	<1 KB	Default generation params
`chat_template.jinja`	16 KB	Gemma 4 chat template
`tokenizer.json`	32 MB	Full HF-compatible tokenizer
`tokenizer_config.json`	20 KB	Tokenizer metadata
`onnx/decoder_model_merged_q4f16.onnx`	6.67 MB	Decoder graph (q4f16)
`onnx/decoder_model_merged_q4f16.onnx.data`	1.22 GB	Decoder external data
`onnx/embed_tokens_q8.onnx`	2 KB	Embed graph (q8 INT8)
`onnx/embed_tokens_q8.onnx.data`	2.56 GB	Embed external data

Total: ~3.85 GB. Browser-runnable end-to-end (Chrome with WebGPU required; ~9-min first-time compile reported on production).

Usage (browser, via Transformers.js)

import { Gemma4ForCausalLM, AutoProcessor, env } from '@huggingface/transformers';

// Self-host the model assets (HuggingFace Hub itself lacks CORS for
// third-party origins). Set env.remoteHost + env.remotePathTemplate
// to wherever you mirror this repo's files; example uses our R2:
env.remoteHost = 'https://models.synavistra.ai';
env.remotePathTemplate = '/bundles/v1.3.0-20260507T131149Z/{model}/{file}';

const processor = await AutoProcessor.from_pretrained('synavistra/v10-gemma4-e2b-it-ONNX');
const model = await Gemma4ForCausalLM.from_pretrained('synavistra/v10-gemma4-e2b-it-ONNX', {
  dtype: { embed_tokens: 'q8', decoder_model_merged: 'q4f16' },
  device: 'webgpu',
});

Verification

Round-trip integrity from GCS source-of-truth verified by scripts/verify_hf_publication.py in the synavistra repo:

uv run python scripts/verify_hf_publication.py \
  --repo synavistra/v10-gemma4-e2b-it-ONNX \
  --file onnx/decoder_model_merged_q4f16.onnx \
  --gcs gs://synavistra-golden-runs/20260508T050749Z_v10-onnx-q8embed/output/onnx/decoder_model_merged_q4f16.onnx

Run report: docs/runs/<UTC>_t10-hf-publish-verified.md in the synavistra repository.

License

Apache 2.0 (this publication). The base model google/gemma-4-E2B-it carries Google's Gemma Terms of Use; users must accept those terms separately on the Gemma model card.

Contact

partnerships@synavistra.ai

Downloads last month: 19

Model tree for synavistra/v10-gemma4-e2b-it-ONNX

Base model

google/gemma-4-E2B

Finetuned

google/gemma-4-E2B-it

Quantized

(209)

this model