RunuX-AI Benchmark: google/gemma-2-9b on TPU v5e

Benchmark validation card — This repository documents the inference performance of google/gemma-2-9b when optimized with the RunuX-AI runtime on Google TPU v5e.

Key Results (BS=1, BF16, TPU v5e)

Metric	PyTorch (torch_xla)	RunuX-AI	Improvement
Throughput	18.2 tok/s	58.8 tok/s	3.23× faster
Energy	10.99 J/tok	3.4 J/tok	3.23× lower
MXU Util	~32%	88%	2.75× higher

Model Details

Base Model: google/gemma-2-9b
Parameters: 9.0B
Precision: BF16 (bfloat16)
Hardware: Google TPU v5e (v5litepod-1, 197 TFLOPS)
Runtime: RunuX-AI v0.2.0 (no_std Rust, 23 crates)

Methodology

Input: 512 tokens
Decode: 128 tokens (greedy, do_sample=False)
Warmup: 3 iterations
Measurement: 10 iterations (median)
See full methodology

Reproduction

# Install baseline framework
pip install torch torch_xla[tpu] transformers accelerate

# Run baseline benchmark
python benchmark_baselines.py --models gemma-2-9b

Citation

@article{callens2026runux,
  title={RunuX-AI: Memory-Efficient, Energy-Aware Inference Runtime for Edge and Cloud Accelerators},
  author={Callens, Xavier},
  year={2026},
  note={Socrate AI Lab}
}

Author

Xavier Callens — Socrate AI Lab (Non-Profit)

GitHub: xaviercallens/runux-ai-runtime
Dataset: callensxavier/runux-tpu-v5e-benchmarks

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for callensxavier/runux-bench-gemma-2-9b-tpu

Base model

google/gemma-2-9b

Finetuned

(387)

this model