Tokenizer gives an error

by zzman - opened May 30, 2023

May 30, 2023

tokenizer = AutoTokenizer.from_pretrained("timdettmers/guanaco-65b-merged")
gives following error:
ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a tokenizers library serialization file,
(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

I made sure that sentencepiece is installed.

zzman

May 30, 2023

I do have a workaround by using the tokenizer from TheBloke model:
tokenizer = AutoTokenizer.from_pretrained("TheBloke/guanaco-65B-HF")

simsim314

Jun 7, 2023

•

edited Jun 7, 2023

He didn't need to post a tokenizer as it's just fine-tune from llama-65b-hf model, with the exact same tokenizer.
Just use:
model_name = "decapoda-research/llama-65b-hf"
tokenizer = LlamaTokenizer.from_pretrained(model_name)

PS. You will need to import LlamaTokenizer

Drshafi

Jun 15, 2023

•

edited Jun 15, 2023

Why can't we used guanaco-65b-merged or esle we will have to merge by ourselves as given below. How much GPU power is needed to run this model?

model_name = "decapoda-research/llama-65b-hf"
adapters_name = 'timdettmers/guanaco-65b'

model = AutoModelForCausalLM.from_pretrained(
model_name,
#load_in_4bit=True,
torch_dtype=torch.bfloat16,
device_map={"": 0}
)
model = PeftModel.from_pretrained(model, adapters_name)
model = model.merge_and_unload()
tokenizer = LlamaTokenizer.from_pretrained(model_name)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment