RunuX-AI Benchmark: google/gemma-2-9b on TPU v5e

Benchmark validation card โ€” This repository documents the inference performance of google/gemma-2-9b when optimized with the RunuX-AI runtime on Google TPU v5e.

Key Results (BS=1, BF16, TPU v5e)

Metric PyTorch (torch_xla) RunuX-AI Improvement
Throughput 18.2 tok/s 58.8 tok/s 3.23ร— faster
Energy 10.99 J/tok 3.4 J/tok 3.23ร— lower
MXU Util ~32% 88% 2.75ร— higher

Model Details

  • Base Model: google/gemma-2-9b
  • Parameters: 9.0B
  • Precision: BF16 (bfloat16)
  • Hardware: Google TPU v5e (v5litepod-1, 197 TFLOPS)
  • Runtime: RunuX-AI v0.2.0 (no_std Rust, 23 crates)

Methodology

  • Input: 512 tokens
  • Decode: 128 tokens (greedy, do_sample=False)
  • Warmup: 3 iterations
  • Measurement: 10 iterations (median)
  • See full methodology

Reproduction

# Install baseline framework
pip install torch torch_xla[tpu] transformers accelerate

# Run baseline benchmark
python benchmark_baselines.py --models gemma-2-9b

Citation

@article{callens2026runux,
  title={RunuX-AI: Memory-Efficient, Energy-Aware Inference Runtime for Edge and Cloud Accelerators},
  author={Callens, Xavier},
  year={2026},
  note={Socrate AI Lab}
}

Author

Xavier Callens โ€” Socrate AI Lab (Non-Profit)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for callensxavier/runux-bench-gemma-2-9b-tpu

Finetuned
(387)
this model