Lovelace Medium Alpha1

550M parameter Transformer-XL style model trained on 100B tokens of The Pile!

This model was originally trained for the "Direct Prefrence Heads" paper, but will also be used as the basis for much of my future research. All code used to train and run these models is available here: https://github.com/Avelina9X/memory-transformer-pt4

Model Architecture

Name	Value
Total Parameters	551M
Non-Embedding Parameters	512M
Vocab Size	50272
$d_\text{vocab}$	768
$d_\text{model}$	1536
$n_\text{layers}$	18
FFN Activation	SwiGLU
$d_\text{ffn}$	4096
Attention Type	Full
Positon Embedding	Reversed RoPE with ABF
$n_\text{heads}$	24
$d_\text{key}$	64
Trained Context	2048
Trained Memory	2048
Max Inference Context	4096

Model Collection

Model	Link
Pre-Trained Model	lovelace-medium-alpha1
Fine-Tuned Model	lovelace-medium-alpha1-instruct
DPH Aligned Model	lovelace-medium-alpha1-instruct-hf
DPH Aligned Model (Multiple Heads)	lovelace-medium-alpha1-instruct-hf-multihead

Avelina
/

lovelace-medium-alpha1

Lovelace Medium Alpha1

Model Architecture

Model Collection

Dataset used to train Avelina/lovelace-medium-alpha1

Collection including Avelina/lovelace-medium-alpha1

Direct Preference Heads (Preprint)