Tokenizer issue

#1
by tomaarsen HF staff - opened

Hello!

This is very cool - I'd love to run it myself too, but I get an issue that word_ids() is not accessible on the non-fast LukeTokenizer . Did you get around this issue somehow?

  • Tom Aarsen
Owner

Hello!

First of all, thanks for your library, works great. Yeah I am going to update the model card, I encountered this problem as well, I used the RobertaTokenizer as alternative. I still have to figure out how to use the LukeTokenizer but I am working on it and I will release a v2 soon.

tokenizer = SpanMarkerTokenizer.from_pretrained("roberta-base", config=model.tokenizer.config)
model.set_tokenizer(tokenizer)

Let me know if this solved the problem.

lambdavi changed discussion status to closed

Sign up or log in to comment