Tokenizer `merges.txt` files

#5
by pietrolesci - opened
EleutherAI org

Hi @hails , @stellaathena ,
In search of the merges.txt files for Pythia models I stumbled upon this conversation which discusses a somewhat related issue. Is the mergex.txt still unavailable for Pythia's? Alternatively, is it exactly the same as the GPT-Neox-20B?

(I picked this specific Pythia size randomly as it should be the same across scales)

EleutherAI org

Yes it's exactly the same as the GPT-NeoX-20B tokenizer.

stellaathena changed discussion status to closed
EleutherAI org

+1

EleutherAI org

Thank you so much, @stellaathena and @hails !

Posting the link to the GPT-Neox-20B merges.txt file here for people landing on this page.
https://huggingface.co/EleutherAI/gpt-neox-20b/blob/main/merges.txt

Sign up or log in to comment