CamemBERTa-L8

This model is a pruned version of the pre-trained CamemBERTa checkpoint, obtained by dropping the top-layers from the original model.

Usage

You can use the raw model for masked language modeling (MLM), but it's mostly intended to be fine-tuned on a downstream task, especially one that uses the whole sentence to make decisions such as text classification, extractive question answering, or semantic search. For tasks such as text generation, you should look at autoregressive models like BelGPT-2.

You can use this model directly with a pipeline for masked language modeling:

from transformers import pipeline

unmasker = pipeline('fill-mask', model='antoinelouis/camemberta-L8')
unmasker("Bonjour, je suis un [MASK] modèle.")

You can also use this model to extract the features of a given text:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained('antoinelouis/camemberta-L8')
model = AutoModel.from_pretrained('antoinelouis/camemberta-L8')

text = "Remplacez-moi par le texte de votre choix."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)

Variations

CamemBERTa has originally been released in a base (112M) version. The following checkpoints prune the base variation by dropping the top 2, 4, 6, 8, and 10 pretrained encoding layers, respectively.

Model #Params Size Pruning
CamemBERTa-base 111.8M 447MB -
CamemBERTa-L10 97.6M 386MB -14%
CamemBERTa-L8 83.5M 334MB -25%
CamemBERTa-L6 69.3M 277MB -38%
CamemBERTa-L4 55.1M 220MB -51%
CamemBERTa-L2 40.9M 164MB -63%
Downloads last month
8
Safetensors
Model size
83.5M params
Tensor type
F32
·
Inference Examples
Inference API (serverless) has been turned off for this model.

Model tree for antoinelouis/camemberta-L8

Finetunes
1 model

Collection including antoinelouis/camemberta-L8