tdooms
/

fw-tiny-old

Model card Files Files and versions Community

FW Tiny

This is the tiny version of the bilinear transformers trained on FineWeb-edu. The primary purpose of this model is interpretability, most design choices were made with that in mind.

The code to run this custom model can be found here, along with many utility functions for weight-based interpretability.

Model Details

125 million parameters
8 layers
12 attention heads
model dimension 768
bilinear MLP with expansion factor 4
context length of 512
trained for 16B tokens
rotary positional embedding
Mixtral tokenizer

Downloads last month: 0

Safetensors

Model size

125M params

Tensor type

F32

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including tdooms/fw-tiny-old

Bilinear Transformers (FineWeb)

A small collection of Transformers with bilinear MLPs, trained on the FineWeb-Edu dataset. • 5 items • Updated Oct 15, 2024