Value Error: Unrecognized configuration class <> to build an AutoTokenizer

#4
by jasonskku - opened

Hello. I'm a graduate student of Sungkyunkwan University.
Thank you for distributing this model.
I have a problem when I run this code.

Please help with this problem. I want to run this code.

Env:

WSL (python 3.10.12)
transformer v4.41

Code:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
"LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct")

Choose your prompt

prompt = "Explain who you are" # English example
prompt = "λ„ˆμ˜ μ†Œμ›μ„ 말해봐" # Korean example

messages = [
{"role": "system",
"content": "You are EXAONE model from LG AI Research, a helpful assistant."},
{"role": "user", "content": prompt}
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
)

output = model.generate(
input_ids.to("cuda"),
eos_token_id=tokenizer.eos_token_id,
max_new_tokens=128
)
print(tokenizer.decode(output[0]))

Error:

ValueError Traceback (most recent call last)
Cell In[20], line 15
6 model = AutoModelForCausalLM.from_pretrained(
7 "LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct",
8 token = access_token,
(...)
11 device_map="auto"
12 )
14 # Use the custom tokenizer if available
---> 15 tokenizer = AutoTokenizer.from_pretrained("LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct")
17 # Choose your prompt
18 prompt = "Explain who you are" # English example

File ~/.venv/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py:926, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
920 else:
921 raise ValueError(
922 "This tokenizer cannot be instantiated. Please make sure you have sentencepiece installed "
923 "in order to use this tokenizer."
924 )
--> 926 raise ValueError(
927 f"Unrecognized configuration class {config.class} to build an AutoTokenizer.\n"
928 f"Model type should be one of {', '.join(c.name for c in TOKENIZER_MAPPING.keys())}."
929 )

ValueError: Unrecognized configuration class <class 'transformers_modules.LGAI-EXAONE.EXAONE-3.0-7.8B-Instruct.7f15baedd46858153d817445aff032f4d6cf4939.configuration_exaone.ExaoneConfig'> to build an AutoTokenizer.
Model type should be one of AlbertConfig, AlignConfig, BarkConfig, BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BlipConfig, Blip2Config, BloomConfig, BridgeTowerConfig, BrosConfig, CamembertConfig, CanineConfig, ChameleonConfig, ChineseCLIPConfig, ClapConfig, CLIPConfig, CLIPSegConfig, ClvpConfig, LlamaConfig, CodeGenConfig, CohereConfig, ConvBertConfig, CpmAntConfig, CTRLConfig, Data2VecAudioConfig, Data2VecTextConfig, DbrxConfig, DebertaConfig, DebertaV2Config, DistilBertConfig, DPRConfig, ElectraConfig, ErnieConfig, ErnieMConfig, EsmConfig, FalconConfig, FastSpeech2ConformerConfig, FlaubertConfig, FNetConfig, FSMTConfig, FunnelConfig, GemmaConfig, Gemma2Config, GitConfig, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, GPTSanJapaneseConfig, GroundingDinoConfig, GroupViTConfig, HubertConfig, IBertConfig, IdeficsConfig, Idefics2Config, InstructBlipConfig, InstructBlipVideoConfig, JambaConfig, JetMoeConfig, JukeboxConfig, Kosmos2Config, LayoutLMConfig, LayoutLMv2Config, LayoutLMv3Config, LEDConfig, LiltConfig, LlamaConfig, LlavaConfig, LlavaNextVideoConfig, LlavaNextConfig, LongformerConfig, LongT5Config, LukeConfig, LxmertConfig, M2M100Config, MambaConfig, Mamba2Config, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MgpstrConfig, MistralConfig, MixtralConfig, MobileBertConfig, MPNetConfig, MptConfig, MraConfig, MT5Config, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, NezhaConfig, NllbMoeConfig, NystromformerConfig, OlmoConfig, OneFormerConfig, OpenAIGPTConfig, OPTConfig, Owlv2Config, OwlViTConfig, PaliGemmaConfig, PegasusConfig, PegasusXConfig, PerceiverConfig, PersimmonConfig, PhiConfig, Phi3Config, Pix2StructConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, Qwen2Config, Qwen2MoeConfig, RagConfig, RealmConfig, RecurrentGemmaConfig, ReformerConfig, RemBertConfig, RetriBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, SeamlessM4TConfig, SeamlessM4Tv2Config, SiglipConfig, Speech2TextConfig, Speech2Text2Config, SpeechT5Config, SplinterConfig, SqueezeBertConfig, StableLmConfig, Starcoder2Config, SwitchTransformersConfig, T5Config, TapasConfig, TransfoXLConfig, TvpConfig, UdopConfig, UMT5Config, VideoLlavaConfig, ViltConfig, VipLlavaConfig, VisualBertConfig, VitsConfig, Wav2Vec2Config, Wav2Vec2BertConfig, Wav2Vec2ConformerConfig, WhisperConfig, XCLIPConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig, YosoConfig.

In your error, it says This tokenizer cannot be instantiated. Please make sure you have sentencepiece installed.
Have you checked if sentencepiece is installed in your environment?

I also encountered the same Value Error, but I resolved it by adding trust_remote_code=True to the AutoTokenizer part. Try using it as shown below.

tokenizer = AutoTokenizer.from_pretrained("LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct", trust_remote_code=True)

I also encountered the same Value Error, but I resolved it by adding trust_remote_code=True to the AutoTokenizer part. Try using it as shown below.

tokenizer = AutoTokenizer.from_pretrained("LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct", trust_remote_code=True)

Oh thank you. But, this problem is continued...
tokenizer problem is critical to run this code :(

In your error, it says This tokenizer cannot be instantiated. Please make sure you have sentencepiece installed.
Have you checked if sentencepiece is installed in your environment?

Yes I already installed 'sentencepiece 0.2.0' in my environment.

jasonskku changed discussion status to closed
jasonskku changed discussion status to open

Hmm, if the problem persists even after upgrading the transformer version or installing sentencepiece, adding an auth token when loading the model and tokenizer might help.

model = AutoModelForCausalLM.from_pretrained(
    "LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
    token={youre_huggingface_token}
)

tokenizer = AutoTokenizer.from_pretrained("LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct",     trust_remote_code=True,    token={youre_huggingface_token})

I hope it will help : )

LG AI Research org

It seems that you passed the access_token as a parameter to AutoModelForCausalLM.from_pretrained(). If so, you have to pass it to AutoTokenizer.from_pretrained() as well.

It seems that you passed the access_token as a parameter to AutoModelForCausalLM.from_pretrained(). If so, you have to pass it to AutoTokenizer.from_pretrained() as well.

Hello. yireun.

My code is here. Please check my code?

Thank you for your help.

Code:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
"LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",
token={"my token"}
)
tokenizer = AutoTokenizer.from_pretrained("LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct",
trust_remote_code=True,
token={"my token"})

prompt = "Explain who you are" # English example
prompt = "λ„ˆμ˜ μ†Œμ›μ„ 말해봐" # Korean example

messages = [
{"role": "system",
"content": "You are EXAONE model from LG AI Research, a helpful assistant."},
{"role": "user", "content": prompt}
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
)

output = model.generate(
input_ids.to("cuda"),
eos_token_id=tokenizer.eos_token_id,
max_new_tokens=128
)
print(tokenizer.decode(output[0]))

LG AI Research org
β€’
edited 3 days ago

Hi, jasonskku. Your code works fine in the following environments:
- ubuntu 22.04 with GPU A100
- python 3.10.8
- torch==2.4.0, transformers==4.44.0, accelerate==0.33.0
If you still have problems, it may be a problem with your WSL configuration.

LG AI Research org

When removing the access_token in AutoTokenizer.from_pretrained(), the following errors similar to yours occur.

  • tokenizer = AutoTokenizer.from_pretrained("LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct", trust_remote_code=True)

warnings.warn(
Traceback (most recent call last):
File "/.../sample_test.py", line 13, in
tokenizer = AutoTokenizer.from_pretrained("LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct",
File "/.../python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 909, in from_pretrained
raise ValueError(
ValueError: Unrecognized configuration class <class 'transformers_modules.LGAI-EXAONE.EXAONE-3.0-7.8B-Instruct.7f15baedd46858153d817445aff032f4d6cf4939.configuration_exaone.ExaoneConfig'> to build an AutoTokenizer.
Model type should be one of AlbertConfig, AlignConfig, BarkConfig, BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BlipConfig, Blip2Config, BloomConfig, BridgeTowerConfig, BrosConfig, CamembertConfig, CanineConfig, ChineseCLIPConfig, ClapConfig, CLIPConfig, CLIPSegConfig, ClvpConfig, LlamaConfig, CodeGenConfig, CohereConfig, ConvBertConfig, CpmAntConfig, CTRLConfig, Data2VecAudioConfig, Data2VecTextConfig, DbrxConfig, DebertaConfig, DebertaV2Config, DistilBertConfig, DPRConfig, ElectraConfig, ErnieConfig, ErnieMConfig, EsmConfig, FalconConfig, FastSpeech2ConformerConfig, FlaubertConfig, FNetConfig, FSMTConfig, FunnelConfig, GemmaConfig, GitConfig, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, GPTSanJapaneseConfig, GroundingDinoConfig, GroupViTConfig, HubertConfig, IBertConfig, IdeficsConfig, Idefics2Config, InstructBlipConfig, JambaConfig, JetMoeConfig, JukeboxConfig, Kosmos2Config, LayoutLMConfig, LayoutLMv2Config, LayoutLMv3Config, LEDConfig, LiltConfig, LlamaConfig, LlavaConfig, LlavaNextConfig, LongformerConfig, LongT5Config, LukeConfig, LxmertConfig, M2M100Config, MambaConfig, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MgpstrConfig, MistralConfig, MixtralConfig, MobileBertConfig, MPNetConfig, MptConfig, MraConfig, MT5Config, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, NezhaConfig, NllbMoeConfig, NystromformerConfig, OlmoConfig, OneFormerConfig, OpenAIGPTConfig, OPTConfig, Owlv2Config, OwlViTConfig, PaliGemmaConfig, PegasusConfig, PegasusXConfig, PerceiverConfig, PersimmonConfig, PhiConfig, Phi3Config, Pix2StructConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, Qwen2Config, Qwen2MoeConfig, RagConfig, RealmConfig, RecurrentGemmaConfig, ReformerConfig, RemBertConfig, RetriBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, SeamlessM4TConfig, SeamlessM4Tv2Config, SiglipConfig, Speech2TextConfig, Speech2Text2Config, SpeechT5Config, SplinterConfig, SqueezeBertConfig, StableLmConfig, Starcoder2Config, SwitchTransformersConfig, T5Config, TapasConfig, TransfoXLConfig, TvpConfig, UdopConfig, UMT5Config, VideoLlavaConfig, ViltConfig, VipLlavaConfig, VisualBertConfig, VitsConfig, Wav2Vec2Config, Wav2Vec2BertConfig, Wav2Vec2ConformerConfig, WhisperConfig, XCLIPConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig, YosoConfig.

Sign up or log in to comment