Bilinear Transformers (FineWeb)
Collection
A small collection of Transformers with bilinear MLPs, trained on the FineWeb-Edu dataset.
•
5 items
•
Updated
This is the tiny version of the bilinear transformers trained on FineWeb-edu. The primary purpose of this model is interpretability, most design choices were made with that in mind.
The code to run this custom model can be found here, along with many utility functions for weight-based interpretability.