tdooms
/

fw-small

Inference Endpoints

Model card Files Files and versions Community

Edit model card

FW Small

This is the tiny version of the bilinear transformers trained on FineWeb-edu. The primary purpose of this model is interpretability, most design choices were made with that in mind.

The code to run this custom model can be found here, along with many utility functions for weight-based interpretability.

Model Details

162 million parameters
12 layers
12 attention heads
model dimension 768
bilinear MLP with expansion factor 4
context length of 512
trained for 16B tokens
rotary positional embedding
Mixtral tokenizer

Downloads last month: 86

Safetensors

Model size

162M params

Tensor type

F32

·

Inference API

Unable to determine this model’s pipeline type. Check the docs .

Collection including tdooms/fw-small

Bilinear Transformers (FineWeb)

A small collection of Transformers with bilinear MLPs, trained on the FineWeb-Edu dataset. • 5 items • Updated 20 days ago