long texts are not labelled to the end

#5
by valentinbeuze - opened

If I copy and paste your default text ten times ("Apple est créée le 1er avril..."), something is wrong
The last paragraphs are not labelled
Any idea? Is it related to a prefixed maximum number of words for inference?
Do I have to cut my text into blocks to use your model?
Thanks

Hello Valentin,

There is indeed a predefined maximum number of tokens in each model. For camembert models this is around 500 tokens. This means that depending on how many tokens each word will be split, you will be limited to a certain number of words (I would guess probably around 100/200 words).
You can find models which handle more tokens but there will always be a limit.
So yes I would recommend to split your text before.

Thanks,
Jean-Baptiste

Jean-Baptiste changed discussion status to closed

Sign up or log in to comment