NewComputeBench-CLM-Digital
Collection
4 items
•
Updated
A 200M parameter language model trained on 22 * 200M
tokens from FineWeb-Edu dataset.
aixsim-200M is a transformer-based language model with approximately 200 million parameters (embedding layer params excluded). It uses RMSNorm for normalization and is trained on the FineWeb-Edu dataset.
Experiment setup and training logs can be found at wandb run.
import transformers
model_name="AICrossSim/clm-200m"
model = transformers.AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
wikitext | 2 | none | 0 | bits_per_byte | ↓ | 1.0994 | ± | N/A |
none | 0 | byte_perplexity | ↓ | 2.1427 | ± | N/A | ||
none | 0 | word_perplexity | ↓ | 58.8531 | ± | N/A |