Text Generation
Transformers
Safetensors
English
mistral
text-generation-inference
Inference Endpoints

Difference from MK2

#3
by mrfakename - opened

Hi, how is this different from the mk2 version?

This version was trained on longer sequences (16384 tokens vs. 4192 tokens), in addition to this I processed the individual stories in the datasets into 16k token sequences whereas for mk1 they were left plain resulting in them being trimmed.

PocketDoc changed discussion status to closed
PocketDoc changed discussion status to open

Sign up or log in to comment