Upload tokenizer

#4
by ArthurZ HF staff - opened
No description provided.

hello can this be merged?

HPC-AI Technology org

Hey @ArthurZ , thank you for uploading the tokenizer! That will definitely simply the loading process and improve user experience. We're viewing and checking with @Xenova about this PR (maybe waiting for his/her/their approval as well).

HPC-AI Technology org

hello can this be merged?

Yes, it will. We're checking with the author who provided a transfomers-compatible tokenizer in discussions several days ago.

Thank you!

image.png

Yes you can merge! As mentioned in another post, this tokenizer matches the original on the entire xnli dataset (all languages)! This PR also adds the slow-tokenizer in case a user wants to fallback on it.

Jonathan1909 changed pull request status to merged
HPC-AI Technology org

Thank you @ArthurZ . I've merged the PR and tested on it. It works pretty well!

HPC-AI Technology org

hello can this be merged?

Hey @ehartford , the PR has been merged and you can now directly use the following method to load the tokenizer

tokenizer = AutoTokenizer.from_pretrained("hpcai-tech/grok-1", trust_remote_code=True)

If you have downloaded the model as a repository, you might want to use git pull to get the tokenizer updated.

We have also updated usage case in both model card and our example in ColossalAI GitHub Repository.

Sign up or log in to comment