Laguna-XS.2 โ€” Pre-compiled for AWS Neuron (trn2.3xlarge)

Pre-compiled and pre-sharded model artifacts for serving poolside/Laguna-XS.2 on AWS Trainium2 using NxD Inference.

Configuration

  • Instance: trn2.3xlarge (LNC=2, 4 logical cores)
  • TP degree: 4
  • Batch size: 4 (TKG), 1 (CTE)
  • Max sequence length: 4096
  • Precision: BF16
  • SDK: Neuron SDK 2.29 (neuronx-cc 2.24, NxDI 0.9.17334)

Files

File Size Description
4.3 GB Compiled NEFFs (6 CTE + 6 TKG buckets)
12 KB NxDI inference configuration
16 GB Sharded weights for TP rank 0
16 GB Sharded weights for TP rank 1
16 GB Sharded weights for TP rank 2
16 GB Sharded weights for TP rank 3

Usage with vLLM

Performance

Metric Value
Throughput (BS=1) ~50 tok/s (via vLLM)
Throughput (BS=4, raw) 223 tok/s
Throughput (BS=8, raw) 310 tok/s
TPOT (BS=1) 11 ms

Requirements

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for jburtoft/Laguna-XS2-neuron-compiled

Finetuned
(5)
this model