Basemodel: GPT-Neo

Configs: Vocab size: 10,000 Hidden size: 512 Max position embeddings: 512 Number of layers: 2 Number of heads: 4 Window size: 256 Intermediate-size: 256

Results:

Task: glue Score: 55.15 Confidence Interval: [52.54, 56.73]
Task: blimp Score: 55.38 Confidence Interval: [53.68, 56.47]

Downloads last month: 16

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train AISE-TUDelft/Custom-Activations-GPT-KAN

Collection including AISE-TUDelft/Custom-Activations-GPT-KAN

BRP Tiny-Transformers

Collection

Models for the 2024-Q4 BSc. Research Project: "Architectural Decisions for Language Modelling with Small Transformers". • 14 items • Updated Jun 25