NewComputeBench-CLM-Digital
Collection
4 items
•
Updated
A 60M parameter language model trained on 22 * 60M
tokens from FineWeb-Edu dataset.
aixsim-60M is a transformer-based language model with approximately 60 million parameters (embedding layer params excluded). It uses RMSNorm for normalization and is trained on the FineWeb-Edu dataset.
Experiment setup and training logs can be found at wandb run.
import transformers
model_name="AICrossSim/clm-60m"
model = transformers.AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
wikitext | 2 | none | 0 | bits_per_byte | ↓ | 1.6693 | ± | N/A |
none | 0 | byte_perplexity | ↓ | 3.1806 | ± | N/A | ||
none | 0 | word_perplexity | ↓ | 486.5306 | ± | N/A |