Erebus-487M-base

Erebus-487M is a custom-trained 487M parameter language model based on the Llama architecture.

Model Details

  • Architecture: Llama-style transformer
  • Parameters: 487.8M
  • Context Length: 1024 tokens
  • Training Data: FineWeb-Edu (529M tokens)
  • Training Steps: 10,000
  • Final Loss: 3.0528

Architecture Specifics

  • Layers: 12
  • Hidden Size: 1536
  • Attention Heads: 16 (Query) / 4 (Key/Value)
  • Intermediate Size: 6144
  • Grouped Query Attention (GQA): 4:1 ratio
  • RoPE: Rotary Position Embeddings
  • Activation: SwiGLU
  • Normalization: RMSNorm

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("erebus-487m-base")
tokenizer = AutoTokenizer.from_pretrained("erebus-487m-base")

prompt = "The future of AI is"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0]))

Training

Trained from scratch on FineWeb-Edu using:

  • AdamW optimizer (lr=3e-4)
  • Batch size: 128 (8 batch × 16 grad accum)
  • Context: 1024 tokens

License

Apache 2.0

Developer's Note

This model is moreso a bit of practice for me, but maybe there could be some application somewhere

Downloads last month
2
Safetensors
Model size
0.5B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for soyrsoyr/erebus-487m-base

Adapters
1 model

Dataset used to train soyrsoyr/erebus-487m-base

Collection including soyrsoyr/erebus-487m-base