ValueError: Tokenizer class Arcade100kTokenizer does not exist or is not currently imported.

#7
by interstellarninja - opened

I have trained a qlora with stablelm-2-zephyr-1_6b and I'm trying to inference the merged model. I have also downloaded the tokenization.arcade100k.py into the merged folder but i still get the error with code below:

       self.bnb_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_use_double_quant=True,
        )
        self.model = AutoModelForCausalLM.from_pretrained(
            model_path,
            trust_remote_code=True,
            return_dict=True,
            quantization_config=self.bnb_config,
            torch_dtype=torch.bfloat16,
            device_map="auto",
        )

        self.tokenizer = AutoTokenizer.from_pretrained(model_path)
        self.tokenizer.pad_token = self.tokenizer.eos_token
        self.tokenizer.padding_side = "left"

try this

self.tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

thanks g-ronimo but i'm using the local merged qlora model:

btw this worked importing Arcade100kTokenizer into inference code:

from tokenization_arcade100k import Arcade100kTokenizer
self.tokenizer = Arcade100kTokenizer.from_pretrained(model_path)
Stability AI org

Hi, @interstellarninja πŸ‘‹ You still need to pass trust_remote_code=True to the AutoTokenizer.from_pretrained method even if files are local because of the custom tokenizer implementation. See relevant code here.

jon-tow changed discussion status to closed

Sign up or log in to comment