Edit model card

Lovelace Medium Alpha1

550M parameter Transformer-XL style model trained on 100B tokens of The Pile!

This model was originally trained for the "Direct Prefrence Heads" paper, but will also be used as the basis for much of my future research. All code used to train and run these models is available here: https://github.com/Avelina9X/memory-transformer-pt4

Model Architecture

Name Value
Total Parameters 551M
Non-Embedding Parameters 512M
Vocab Size 50272
dvocabd_\text{vocab} 768
dmodeld_\text{model} 1536
nlayersn_\text{layers} 18
FFN Activation SwiGLU
dffnd_\text{ffn} 4096
Attention Type Full
Positon Embedding Reversed RoPE with ABF
nheadsn_\text{heads} 24
dkeyd_\text{key} 64
Trained Context 2048
Trained Memory 2048
Max Inference Context 4096

Model Collection

Model Link
Pre-Trained Model lovelace-medium-alpha1
Fine-Tuned Model lovelace-medium-alpha1-instruct
DPH Aligned Model lovelace-medium-alpha1-instruct-hf
DPH Aligned Model (Multiple Heads) lovelace-medium-alpha1-instruct-hf-multihead
Downloads last month
2
Safetensors
Model size
551M params
Tensor type
FP16
·

Dataset used to train Avelina/lovelace-medium-alpha1

Collection including Avelina/lovelace-medium-alpha1