Spiece.Model

#1
by flexudy - opened

Howdy @yhavinga

I try to load the tokenizer on AWS lambda but I get this error.
module initialization error: Internal: /sentencepiece/python/bundled/sentencepiece/src/sentencepiece_processor.cc(848) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

Any idea?

Locally it works but for some reason, not on lambda.
When I upload a spiece.model file in the model folder (from another model just to see if it works) then it works fine, but the predictions are garbage.

Hey @flexudy

Are you loading the tokenizer with AutoTokenizer.from_pretrained() ? And is the tokenizers package recent?

The (sentencepiece) tokenizer of t5-base-dutch was created with HF tools instead of the 'official' sentencepiece tokenizer. One difference is that the latter creates 'spiece.model', which is absent from the tokenizers created by HF tools, that only create tokenizer.json. A while ago I also got cryptic errors when loading the HF-created tokenizers that worked without issues a few months earlier. In the end I could solve these problems by either upgrading the tokenizers package or downgrading if I was at the latest version. Lately I haven't had any issues anymore, so I suspect recent releases tokenizers are subjected to more rigorous integration tests.

hey @yhavinga

Thanks for the quick response.
I am loading the tokenizer using T5TokenizerFast. I currently use transformers 4.18.0. Also tried everything between 4.9 and 4.23.
On MacOS, everything is fine. But not on AWS lambda.

I thought you might have some clues about why this error would happen.

What does pip freeze | grep tokenizers say? I just checked in two environments and it works with 0.12.1 and 0.13.1.
Also, are there perhaps lingering tokenizer files in the working directory of the script? I had a bug once that the tokenizer load would load from the current directory in stead of the passed model id on the HF hub.

flexudy changed discussion status to closed

@yhavinga I found the error. The tokenizer.json file was not packaged properly.
Thank you very much.

Sign up or log in to comment