Instructions to use synavistra/v10-gemma4-e2b-it-ONNX with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers.js
How to use synavistra/v10-gemma4-e2b-it-ONNX with Transformers.js:
// npm i @huggingface/transformers import { pipeline } from '@huggingface/transformers'; // Allocate pipeline const pipe = await pipeline('text-generation', 'synavistra/v10-gemma4-e2b-it-ONNX');
Synavistra v10 Gemma 4 E2B-IT ONNX
Synavistra-fine-tuned Gemma 4 E2B-IT, quantized to q4f16 decoder + q8 embed
tokens, optimized for browser inference via Transformers.js + ONNX Runtime
Web (WebGPU). The same artifact that ships in production at
synavistra.ai/showcases/ai-document-analysis/ as bundle
v1.3.0-20260507T131149Z.
Provenance
This publication is a 1:1 mirror of the canonical artifact tree stored at
gs://synavistra-golden-runs/20260508T050749Z_v10-onnx-q8embed/output/
(per the Synavistra r2-versions.json bundle manifest).
Chain of artifacts:
- Training run (v10 cycle):
gs://synavistra-golden-runs/20260504T172037Z_v10-cycle/ - ONNX export (q4f16 decoder + initial fp16 embed):
gs://synavistra-golden-runs/20260507T125522Z_v10-onnx-q4f16-v2/output/ - Post-release q8 patch (this artifact):
gs://synavistra-golden-runs/20260508T050749Z_v10-onnx-q8embed/output/ - Production R2 deployment (browser-served):
models.synavistra.ai/bundles/v1.3.0-20260507T131149Z/synavistra/v10-gemma4-e2b-it-ONNX/ - HuggingFace publication (this repo).
The R2 deployment has additional patches (filename underscore variants for Transformers.js external-data fetch). This HF publication mirrors the canonical GCS source (dot-named files); customers fetching from HF should either rename the data files to underscore variants client-side, or fetch directly from R2.
Components
| File | Size | Purpose |
|---|---|---|
config.json |
5 KB | Model architecture metadata |
generation_config.json |
<1 KB | Default generation params |
chat_template.jinja |
16 KB | Gemma 4 chat template |
tokenizer.json |
32 MB | Full HF-compatible tokenizer |
tokenizer_config.json |
20 KB | Tokenizer metadata |
onnx/decoder_model_merged_q4f16.onnx |
6.67 MB | Decoder graph (q4f16) |
onnx/decoder_model_merged_q4f16.onnx.data |
1.22 GB | Decoder external data |
onnx/embed_tokens_q8.onnx |
2 KB | Embed graph (q8 INT8) |
onnx/embed_tokens_q8.onnx.data |
2.56 GB | Embed external data |
Total: ~3.85 GB. Browser-runnable end-to-end (Chrome with WebGPU required; ~9-min first-time compile reported on production).
Usage (browser, via Transformers.js)
import { Gemma4ForCausalLM, AutoProcessor, env } from '@huggingface/transformers';
// Self-host the model assets (HuggingFace Hub itself lacks CORS for
// third-party origins). Set env.remoteHost + env.remotePathTemplate
// to wherever you mirror this repo's files; example uses our R2:
env.remoteHost = 'https://models.synavistra.ai';
env.remotePathTemplate = '/bundles/v1.3.0-20260507T131149Z/{model}/{file}';
const processor = await AutoProcessor.from_pretrained('synavistra/v10-gemma4-e2b-it-ONNX');
const model = await Gemma4ForCausalLM.from_pretrained('synavistra/v10-gemma4-e2b-it-ONNX', {
dtype: { embed_tokens: 'q8', decoder_model_merged: 'q4f16' },
device: 'webgpu',
});
Verification
Round-trip integrity from GCS source-of-truth verified by
scripts/verify_hf_publication.py in the synavistra repo:
uv run python scripts/verify_hf_publication.py \
--repo synavistra/v10-gemma4-e2b-it-ONNX \
--file onnx/decoder_model_merged_q4f16.onnx \
--gcs gs://synavistra-golden-runs/20260508T050749Z_v10-onnx-q8embed/output/onnx/decoder_model_merged_q4f16.onnx
Run report: docs/runs/<UTC>_t10-hf-publish-verified.md in the synavistra
repository.
License
Apache 2.0 (this publication). The base model google/gemma-4-E2B-it
carries Google's Gemma Terms of Use; users must accept those terms
separately on the Gemma model card.
Contact
partnerships@synavistra.ai
- Downloads last month
- 19