Tokenizer issue

by tomaarsen HF staff - opened Jan 9, 2024

Jan 9, 2024

Hello!

This is very cool - I'd love to run it myself too, but I get an issue that word_ids() is not accessible on the non-fast LukeTokenizer . Did you get around this issue somehow?

Tom Aarsen

lambdavi

Owner Jan 9, 2024

Hello!

First of all, thanks for your library, works great. Yeah I am going to update the model card, I encountered this problem as well, I used the RobertaTokenizer as alternative. I still have to figure out how to use the LukeTokenizer but I am working on it and I will release a v2 soon.

tokenizer = SpanMarkerTokenizer.from_pretrained("roberta-base", config=model.tokenizer.config)
model.set_tokenizer(tokenizer)

Let me know if this solved the problem.

lambdavi changed discussion status to closed Jan 9, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment