CLM-BitFlip
Collection
2 items
•
Updated
A 200M parameter bitflip-aware language model trained on 22 * 200M
tokens from FineWeb-Edu dataset.
bitflip-aixsim-200M is a transformer-based language model with approximately 200 million parameters (embedding layer params excluded). It uses RMSNorm for normalization and is trained on the FineWeb-Edu dataset.
Experiment setup and training logs can be found at wandb run.