RunuX-AI Benchmark: google/gemma-2-9b on TPU v5e
Benchmark validation card โ This repository documents the inference performance of google/gemma-2-9b when optimized with the RunuX-AI runtime on Google TPU v5e.
Key Results (BS=1, BF16, TPU v5e)
| Metric | PyTorch (torch_xla) | RunuX-AI | Improvement |
|---|---|---|---|
| Throughput | 18.2 tok/s | 58.8 tok/s | 3.23ร faster |
| Energy | 10.99 J/tok | 3.4 J/tok | 3.23ร lower |
| MXU Util | ~32% | 88% | 2.75ร higher |
Model Details
- Base Model: google/gemma-2-9b
- Parameters: 9.0B
- Precision: BF16 (bfloat16)
- Hardware: Google TPU v5e (v5litepod-1, 197 TFLOPS)
- Runtime: RunuX-AI v0.2.0 (no_std Rust, 23 crates)
Methodology
- Input: 512 tokens
- Decode: 128 tokens (greedy,
do_sample=False) - Warmup: 3 iterations
- Measurement: 10 iterations (median)
- See full methodology
Reproduction
# Install baseline framework
pip install torch torch_xla[tpu] transformers accelerate
# Run baseline benchmark
python benchmark_baselines.py --models gemma-2-9b
Citation
@article{callens2026runux,
title={RunuX-AI: Memory-Efficient, Energy-Aware Inference Runtime for Edge and Cloud Accelerators},
author={Callens, Xavier},
year={2026},
note={Socrate AI Lab}
}
Author
Xavier Callens โ Socrate AI Lab (Non-Profit)
- GitHub: xaviercallens/runux-ai-runtime
- Dataset: callensxavier/runux-tpu-v5e-benchmarks
Model tree for callensxavier/runux-bench-gemma-2-9b-tpu
Base model
google/gemma-2-9b