HuggingFaceFW/fineweb-edu
Viewer • Updated • 3.5B • 412k • 1.16k
This repository holds the original AlterEgo checkpoint, in the model's own from-scratch architecture - i.e. the model exactly as it was trained, before any format conversion.
transformers-native version instead: jbomdev/AlterEgo. It's a numerically-lossless conversion of this checkpoint to LlamaForCausalLM (verified, max logit difference ~1e-6), and works out of the box with transformers, vLLM, and GGUF tooling. Full architecture, training details, benchmarks, and usage are documented on that model card."model" key. Load and run it with the model definition and inference code in the training repo: github.com/J-bom/AlterEgo.In short: alterego_raw is the original; jbomdev/AlterEgo is the converted, ready-to-use version.
A 373M-parameter, decoder-only transformer (Llama-style: GQA, RoPE, SwiGLU, RMSNorm) pre-trained from scratch on ~10B tokens of FineWeb-Edu and instruction-tuned on UltraChat-200K (ChatML). See the main model card for architecture, training curves, hyperparameters, evaluation, and limitations.