Edit model card

FW Small

This is the tiny version of the bilinear transformers trained on FineWeb-edu. The primary purpose of this model is interpretability, most design choices were made with that in mind.

The code to run this custom model can be found here, along with many utility functions for weight-based interpretability.

Model Details

  • 162 million parameters
  • 12 layers
  • 12 attention heads
  • model dimension 768
  • bilinear MLP with expansion factor 4
  • context length of 512
  • trained for 16B tokens
  • rotary positional embedding
  • Mixtral tokenizer
Downloads last month
86
Safetensors
Model size
162M params
Tensor type
F32
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Collection including tdooms/fw-small