antoinelouis
/

camembert-L10

Feature Extraction

text-embeddings-inference

Model card Files Files and versions Community

camembert-L10 / README.md

antoinelouis's picture

Update README.md

9c282ac verified about 2 months ago

|

raw history blame contribute delete

No virus

2.66 kB

	---
	license: mit
	language:
	- fr
	library_name: transformers
	inference: false
	pipeline_tag: feature-extraction
	---
	# CamemBERT-L10

	This model is a pruned version of the pre-trained [CamemBERT](https://huggingface.co/camembert-base) checkpoint, obtained by [dropping the top-layers](https://doi.org/10.48550/arXiv.2004.03844) from the original model.

	![](illustration.jpeg)

	## Usage

	You can use the raw model for masked language modeling (MLM), but it's mostly intended to be fine-tuned on a downstream task, especially one that uses the whole sentence to make decisions such as text classification, extractive question answering, or semantic search. For tasks such as text generation, you should look at autoregressive models like [BelGPT-2](https://huggingface.co/antoinelouis/belgpt2).

	You can use this model directly with a pipeline for [masked language modeling](https://huggingface.co/tasks/fill-mask):

	```python
	from transformers import pipeline

	unmasker = pipeline('fill-mask', model='antoinelouis/camembert-L10')
	unmasker("Bonjour, je suis un [MASK] modèle.")
	```

	You can also use this model to [extract the features](https://huggingface.co/tasks/feature-extraction) of a given text:

	```python
	from transformers import AutoTokenizer, AutoModel

	tokenizer = AutoTokenizer.from_pretrained('antoinelouis/camembert-L10')
	model = AutoModel.from_pretrained('antoinelouis/camembert-L10')

	text = "Remplacez-moi par le texte de votre choix."
	encoded_input = tokenizer(text, return_tensors='pt')
	output = model(**encoded_input)
	```

	## Variations

	CamemBERT has originally been released in base (110M) and large (335M) variations. The following checkpoints prune the base variation by dropping the top 2, 4, 6, 8, and 10 pretrained encoding layers, respectively.

	\| Model \| #Params \| Size \| Pruning \|
	\|--------------------------------------------------------------------\|:-------:\|:-----:\|:-------:\|
	\| [CamemBERT-base](https://huggingface.co/camembert-base) \| 110.6M \| 445MB \| - \|
	\| \| \| \| \|
	\| CamemBERT-L10 \| 96.4M \| 386MB \| -13% \|
	\| [CamemBERT-L8](https://huggingface.co/antoinelouis/camembert-L8) \| 82.3M \| 329MB \| -26% \|
	\| [CamemBERT-L6](https://huggingface.co/antoinelouis/camembert-L6) \| 68.1M \| 272MB \| -38% \|
	\| [CamemBERT-L4](https://huggingface.co/antoinelouis/camembert-L4) \| 53.9M \| 216MB \| -51% \|
	\| [CamemBERT-L2](https://huggingface.co/antoinelouis/camembert-L2) \| 39.7M \| 159MB \| -64% \|