Tokenizer `merges.txt` files
#5
by
pietrolesci
- opened
Hi
@hails
,
@stellaathena
,
In search of the merges.txt
files for Pythia models I stumbled upon this conversation which discusses a somewhat related issue. Is the mergex.txt
still unavailable for Pythia's? Alternatively, is it exactly the same as the GPT-Neox-20B?
(I picked this specific Pythia size randomly as it should be the same across scales)
Yes it's exactly the same as the GPT-NeoX-20B tokenizer.
stellaathena
changed discussion status to
closed
+1
Thank you so much, @stellaathena and @hails !
Posting the link to the GPT-Neox-20B merges.txt
file here for people landing on this page.
https://huggingface.co/EleutherAI/gpt-neox-20b/blob/main/merges.txt