Model has no tokenizer included in the dowload file

#2
by Bourhano - opened

Where can we find he tokenizer for this version of CamemBERT, and do all CamemBERT models proposed by this account 'camembert' use the same tokenizer? Since I already have a version of the tokenizer.json but do not recall where I got it from.

Edit:

It seems that the tockenizer differs between 'camembert-base' and 'camembert-large' according to the paper that introduces CamemBERT.
It mentions:
'The second and the third models, camembert-base and camembert-large, respectively, are based on the RoBERTa architecture (Liu et al., 2019), a BERT-based model with some changes (tokenizer, training task, optimization, etc.)'

Sign up or log in to comment