Tokenizer class does not exist

#6
by teknium - opened

I am trying to use LM-Eval-Harness to benchmark the model, it uses huggingface's AutoTokenizer class to load the Tokenizer, but it is giving this error:

Traceback (most recent call last):
File "/home/teknium/dakota/lm-evaluation-harness/main.py", line 89, in
main()
File "/home/teknium/dakota/lm-evaluation-harness/main.py", line 57, in main
results = evaluator.simple_evaluate(
File "/home/teknium/dakota/lm-evaluation-harness/lm_eval/utils.py", line 242, in _wrapper
return fn(*args, **kwargs)
File "/home/teknium/dakota/lm-evaluation-harness/lm_eval/evaluator.py", line 69, in simple_evaluate
lm = lm_eval.models.get_model(model).create_from_arg_string(
File "/home/teknium/dakota/lm-evaluation-harness/lm_eval/base.py", line 115, in create_from_arg_string
return cls(**args, **args2)
File "/home/teknium/dakota/lm-evaluation-harness/lm_eval/models/huggingface.py", line 189, in init
self.tokenizer = self._create_auto_tokenizer(
File "/home/teknium/dakota/lm-evaluation-harness/lm_eval/models/huggingface.py", line 492, in _create_auto_tokenizer
tokenizer = super()._create_auto_tokenizer(
File "/home/teknium/dakota/lm-evaluation-harness/lm_eval/models/huggingface.py", line 313, in _create_auto_tokenizer
tokenizer = self.AUTO_TOKENIZER_CLASS.from_pretrained(
File "/home/teknium/lm-evaluation-harness/venv/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 748, in from_pretrained
raise ValueError(
ValueError: Tokenizer class YiTokenizer does not exist or is not currently imported.
Running tasks: openbookqa,arc_easy,winogrande,hellaswag,arc_challenge,piqa,boolq with batch size: 14 and output path: ./benchmark_logs/01-ai/Yi-6B_float16_GPT4All.json

Yi models use custom model now, so you should add trust_remote_code=True in from_pretrained

Yi models use custom model now, so you should add trust_remote_code=True in from_pretrained

I already have trust remote code set, still doesn't accept it

01-ai org

Yi models use custom model now, so you should add trust_remote_code=True in from_pretrained

I already have trust remote code set, still doesn't accept it

Can you prepare a minimal reproducible code? So we can check out what's the problem.

Yes:

git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e .

then

python3 main.py --model hf-causal-experimental --model_args pretrained="01-ai/Yi-6B",dtype="bfloat16",trust_remote_code=True,use_accelerate=True --tasks truthfulqa_mc --batch_size 1

image.png

It works just fine in my environment, can you reproduce the problem in a new virtual env?

Fresh venv still crashes it for me :/

Maybe you can have a try with our Docker image (will be released soon: https://github.com/01-ai/Yi/issues/3)

cc @clefourrier might have some ideas here

Could you try our latest instructions at https://github.com/01-ai/Yi#1-prepare-development-environment ?

Same issue here. Any news on this?

Sign up or log in to comment