How to run this model?

#2
by jankke - opened

Hi!

Recently, I found a repository with CharacterBert and realized that this model would be perfect for one of the projects, on which I am currently working. However, while trying to figure out how to use this model I encountered an error:

tokenizer = AutoTokenizer.from_pretrained("helboukkouri/character-bert", trust_remote_code=True)

NotImplementedError: CharacterBERT does not have a token vocabulary.

I tried to download files from "Files and version" and tinker with tokenization_character_bert.py and successfully loaded the tokenizer. However, I still cannot figure out how to pass output from the tokenizer to the model. More precisely I can't return outputs from the tokenizer as pytorch tensors, because they have mismatched length (due to special symbols). Do you have any idea how to use this model within the HuggingFace framework?

Hey @jankke . This model is not really maintained right now.
I think that your best bet for now is to rely on this: https://github.com/helboukkouri/character-bert

After the model is merged into the transformers library you'll be able to use it through AutoModel/Tokenizer.

You might also check this: https://huggingface.co/helboukkouri/character-bert-base-uncased
And try to load it with the code from here: https://github.com/helboukkouri/transformers/tree/add_character_bert_model

But that's more risky ;)

helboukkouri changed discussion status to closed

Sign up or log in to comment