Tokenizer class Midm_bitext_Tokenizer does not exist or is not currently imported

#11
by supark - opened

Hello.

I'm trying to fine-tune(peft) the KT-AI/midm-bitext-S-7B-inst-v1 model.
When trying to train using the AutoTokenizer and AutoModelForCausalLM functions provided by the Transformer package, I encountered the following error message.

  • Tokenizer class Midm_bitext_Tokenizer does not exist or is not currently imported

I also attempted to invoke the tokenizer using the Midm_bitext_Tokenizer class included in midm_bitext_tokenization.py, but I still received the same error message. I'm wondering if there's a way to address this issue.

@supark
Could you share your test code and the transformers version?

@ktthkim

Below are the specific versions for testing configuration:

Python 3.10
Transformers==4.33.2

The test code itself is lengthy, but it is a standard codebase that you can easily find on the web.

@supark
I tested the sample in the readme using transformers==4.33.2, and it worked well for me.
Please make sure to check trust_remote_code=True when using from_pretrained

@ktthkim
There is no issue with loading the model and tokenizer as you mentioned.
However, when trying to fine-tune(QLoRa) using the Trainer.train() provided by Transformer, the error mentioned above occurs.

Sign up or log in to comment