it might be better to try knowledge distillation

#1
by Alignment-Lab-AI - opened

from the mtp model into pythia over a corpus of very long text, then initializing the weights from pythia into the storywriter architecture.

Alignment-Lab-AI changed discussion status to closed

Sign up or log in to comment