This model is an "extension" of [CamemBERT-base model] (https://camembert-model.fr/) to support up to 4096 tokens. The architecture was previously extended using AllenAI's code for the procedure descriped in the Longformer paper. Their code is Apache-2.0 licensed. Longformer is an "extension" of the RoBERTa model and CamemBERT use the same architecture as RoBERTa (they don't have same tokenizers though and CamemBERT use Whole Word Masking). Hence, this "long CamemBERT" can be pretained or directly fine-tuned using AllenAI's code without substantial modifications after being extended.
I created [a notebook] (https://colab.research.google.com/drive/1B7Gj32yqM2NETdnOlqt997RArYJwV24D#scrollTo=5VKUZnBb4vvD) in order to see the model in action with a HuggingFace pipeline (fillmask). I will share soon a modification of AllenAI's code to pretrain this model using FP16, and add TPU support using TensorFlow or PyTorch-XLA.
- Downloads last month