NewComputeBench-CLM-Digital
Collection
4 items
•
Updated
A 60M parameter language model trained on 22 * 400M
tokens from FineWeb-Edu dataset.
aixsim-400M is a transformer-based language model with approximately 400 million parameters (embedding layer params excluded). It uses RMSNorm for normalization and is trained on the FineWeb-Edu dataset.
Experiment setup and training logs can be found at wandb run.
import transformers
model_name="AICrossSim/clm-400m"
model = transformers.AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
wikitext | 2 | none | 0 | bits_per_byte | ↓ | 0.9886 | ± | N/A |
none | 0 | byte_perplexity | ↓ | 1.9843 | ± | N/A | ||
none | 0 | word_perplexity | ↓ | 39.0317 | ± | N/A |