AlterEgo-373M - raw checkpoint

This repository holds the original AlterEgo checkpoint, in the model's own from-scratch architecture - i.e. the model exactly as it was trained, before any format conversion.

Want to just use the model? Use the Hugging Face / transformers-native version instead: jbomdev/AlterEgo. It's a numerically-lossless conversion of this checkpoint to LlamaForCausalLM (verified, max logit difference ~1e-6), and works out of the box with transformers, vLLM, and GGUF tooling. Full architecture, training details, benchmarks, and usage are documented on that model card.
Want the raw weights / the original architecture? That's this repo. The checkpoint is a PyTorch state dict saved under the "model" key. Load and run it with the model definition and inference code in the training repo: github.com/J-bom/AlterEgo.

In short: alterego_raw is the original; jbomdev/AlterEgo is the converted, ready-to-use version.

Model summary

A 373M-parameter, decoder-only transformer (Llama-style: GQA, RoPE, SwiGLU, RMSNorm) pre-trained from scratch on ~10B tokens of FineWeb-Edu and instruction-tuned on UltraChat-200K (ChatML). See the main model card for architecture, training curves, hyperparameters, evaluation, and limitations.

Downloads last month: -; Downloads are not tracked for this model. How to track

jbomdev
/

AlterEgo_raw

AlterEgo-373M - raw checkpoint

Model summary

Datasets used to train jbomdev/AlterEgo_raw