Text Generation
Transformers
PyTorch
mpt
Composer
MosaicML
llm-foundry
custom_code
text-generation-inference

Experience

#18
by viktor-ferenczi - opened

After ~2k tokens it starts to ramble and loses context:
image.png

There are 2 issues here: meta-conversation (self-referential messages like "do you still remember the original task" which are not present in the training data) and that it doesn't extrapolate well beyond its training length.

For meta-conversation: this is a data issue, I am thinking of ways to generate conversations that will help with tasks like this.

For length extrapolation: yes, unlike some models it doesn't error when you go above the training sequence length, but it doesn't do as well. I recommend deleting previous messages to make space.

sam-mosaic changed discussion status to closed

Sign up or log in to comment