antoinelouis commited on
Commit
d15358c
1 Parent(s): f428795

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -2
README.md CHANGED
@@ -13,18 +13,37 @@ This model is a pruned version of the pre-trained [CamemBERT](https://huggingfac
13
 
14
  ## Usage
15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  ```python
17
  from transformers import AutoTokenizer, AutoModel
18
 
19
  tokenizer = AutoTokenizer.from_pretrained('antoinelouis/camembert-L4')
20
  model = AutoModel.from_pretrained('antoinelouis/camembert-L4')
 
 
 
 
21
  ```
22
 
23
- ## Comparison
 
 
24
 
25
  | Model | #Params | Size | Pruning |
26
  |--------------------------------------------------------------------|:-------:|:-----:|:-------:|
27
- | [CamemBERT](https://huggingface.co/camembert-base) | 110.6M | 445MB | - |
28
  | | | | |
29
  | [CamemBERT-L10](https://huggingface.co/antoinelouis/camembert-L10) | 96.4M | 386MB | -13% |
30
  | [CamemBERT-L8](https://huggingface.co/antoinelouis/camembert-L8) | 82.3M | 329MB | -26% |
@@ -34,6 +53,8 @@ model = AutoModel.from_pretrained('antoinelouis/camembert-L4')
34
 
35
  ## Citation
36
 
 
 
37
  ```bibtex
38
  @online{louis2023,
39
  author = 'Antoine Louis',
 
13
 
14
  ## Usage
15
 
16
+ You can use the raw model for masked language modeling (MLM), but it's mostly intended to be fine-tuned on a downstream task, especially one that uses the whole sentence to make decisions such as text classification, extractive question answering, or semantic search. For tasks such as text generation, you should look at autoregressive models like [BelGPT-2](https://huggingface.co/antoinelouis/belgpt2).
17
+
18
+ You can use this model directly with a pipeline for [masked language modeling](https://huggingface.co/tasks/fill-mask):
19
+
20
+ ```python
21
+ from transformers import pipeline
22
+
23
+ unmasker = pipeline('fill-mask', model='antoinelouis/camembert-L4')
24
+ unmasker("Bonjour, je suis un [MASK] modèle.")
25
+ ```
26
+
27
+ You can also use this model to [extract the features](https://huggingface.co/tasks/feature-extraction) of a given text:
28
+
29
  ```python
30
  from transformers import AutoTokenizer, AutoModel
31
 
32
  tokenizer = AutoTokenizer.from_pretrained('antoinelouis/camembert-L4')
33
  model = AutoModel.from_pretrained('antoinelouis/camembert-L4')
34
+
35
+ text = "Remplacez-moi par le texte de votre choix."
36
+ encoded_input = tokenizer(text, return_tensors='pt')
37
+ output = model(**encoded_input)
38
  ```
39
 
40
+ ## Variations
41
+
42
+ CamemBERT has originally been released in base (110M) and large (335M) variations. The following checkpoints prune the base variation by dropping the top 2, 4, 6, 8, and 10 pretrained encoding layers, respectively.
43
 
44
  | Model | #Params | Size | Pruning |
45
  |--------------------------------------------------------------------|:-------:|:-----:|:-------:|
46
+ | [CamemBERT-base](https://huggingface.co/camembert-base) | 110.6M | 445MB | - |
47
  | | | | |
48
  | [CamemBERT-L10](https://huggingface.co/antoinelouis/camembert-L10) | 96.4M | 386MB | -13% |
49
  | [CamemBERT-L8](https://huggingface.co/antoinelouis/camembert-L8) | 82.3M | 329MB | -26% |
 
53
 
54
  ## Citation
55
 
56
+ For attribution in academic contexts, please cite this work as:
57
+
58
  ```bibtex
59
  @online{louis2023,
60
  author = 'Antoine Louis',