tdooms
/

ts-large

Model card Files Files and versions Community

TS Large

This is the medium version of the bilinear transformers trained on TinyStories. The primary purpose of this model is interpretability, most design choices were made with that in mind.

The code to run this custom model can be found here, along with many utility functions for weight-based interpretability.

Model Details

82 million parameters
8 layers
12 attention heads
model dimension 768
bilinear MLP with expansion factor 4
context length of 256
trained for 1 epoch (~500M tokens)
rotary positional embedding
custom tinystories tokenizer

Downloads last month: 0

Safetensors

Model size

81.8M params

Tensor type

F32

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including tdooms/ts-large

Bilinear Transformers (TinyStories)

A small collection of Transformers with bilinear MLPs, trained on the TinyStories dataset. • 3 items • Updated Oct 15, 2024