Does anyone know which specific Python library contains the tokenizer that was used to train Llama-3-70b?

#11
by BigDeeper - opened

For GPT-4, for example, it is tiktoken.

Does this help?
https://github.com/meta-llama/llama3/blob/main/llama/tokenizer.py

Interesting. I was wondering before why the OpenAI tokenizer "tiktoken" was working with Llama3. Now it is obvious, they simply extended tiktoken. For some reason, if I install llama3 repo with "pip install -e ." I am unable to import "llama3" within some folders. I can still import using HF functions.

Sign up or log in to comment