Instructions to use Avifenesh/pearl-eagle31-scratch-200k-seq4096 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Avifenesh/pearl-eagle31-scratch-200k-seq4096 with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Avifenesh/pearl-eagle31-scratch-200k-seq4096", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Pearl EAGLE3.1 scratch speculator, 200k seq4096
Draft/speculator head trained from scratch for pearl-ai/Llama-3.1-8B-Instruct-pearl.
Training
- Hidden-state samples: 200,000
- Sequence length: 4096
- Speculator: EAGLE3.1 /
eagle3 - Flags:
--norm-before-residual
--norm-before-fc
--draft-vocab-size 32000
--epochs 5
--lr 1e-4
--total-seq-len 4096
Final checkpoint source: /persist/pearl_eagle31_train_200k_seq4096/checkpoints/checkpoints_eagle31_scratch/checkpoint_best
Matrix benchmark
Stable old/materialized-B Pearl path, full requested matrix:
K = 2,3,4,5,6
C = 1,4,8,16,32,64
t ~= 50,180,350,700,1400
Coverage: 30 K/C cells x 5 prompt buckets = 150 rows.
Best mean output throughput across prompt buckets: K=2, C=64, 1039.1 output tok/s.
See:
FULL_MATRIX_SUMMARY.mdsummary_compact.jsonfull_matrix_summary.jsonlbenchmark_raw/*/combined_summary.csv
Notes
The B-on-chip fused Pearl R128 path is not included in these final results. It was rebuilt as R128 128x128x128 stages=2 to fit H100 SMEM but still crashes requests with CUDA illegal instruction; stable results here use the old/materialized-B path.
- Downloads last month
- 279
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support
Model tree for Avifenesh/pearl-eagle31-scratch-200k-seq4096
Base model
meta-llama/Llama-3.1-8B Finetuned
meta-llama/Llama-3.1-8B-Instruct Quantized
pearl-ai/Llama-3.1-8B-Instruct-pearl