rohitdavas
commited on
Commit
•
5384813
1
Parent(s):
e8091d7
revert back of grammarly fix on readme.md
Browse filesthe last contribution introduced the changes in README.md which were not supposed to be there. Reverting the readme file back to the previous commit.
README.md
CHANGED
@@ -10,7 +10,7 @@ datasets:
|
|
10 |
## Table of Contents
|
11 |
- [Model Details](#model-details)
|
12 |
- [Uses](#uses)
|
13 |
-
- [Risks, Limitations
|
14 |
- [Training](#training)
|
15 |
- [Evaluation](#evaluation)
|
16 |
- [Citation Information](#citation-information)
|
@@ -20,7 +20,7 @@ datasets:
|
|
20 |
## Model Details
|
21 |
- **Model Description:**
|
22 |
CamemBERT is a state-of-the-art language model for French based on the RoBERTa model.
|
23 |
-
It is now available on Hugging Face in 6 different versions with varying
|
24 |
- **Developed by:** Louis Martin\*, Benjamin Muller\*, Pedro Javier Ortiz Suárez\*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
|
25 |
- **Model Type:** Fill-Mask
|
26 |
- **Language(s):** French
|
@@ -38,7 +38,7 @@ It is now available on Hugging Face in 6 different versions with varying numbers
|
|
38 |
This model can be used for Fill-Mask tasks.
|
39 |
|
40 |
|
41 |
-
## Risks, Limitations
|
42 |
**CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.**
|
43 |
|
44 |
Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)).
|
@@ -72,7 +72,7 @@ OSCAR or Open Super-large Crawled Aggregated coRpus is a multilingual corpus obt
|
|
72 |
## Evaluation
|
73 |
|
74 |
|
75 |
-
The model developers evaluated CamemBERT using four different downstream tasks for French: part-of-speech (POS) tagging, dependency parsing, named entity recognition (NER)
|
76 |
|
77 |
|
78 |
|
@@ -81,7 +81,7 @@ The model developers evaluated CamemBERT using four different downstream tasks f
|
|
81 |
```bibtex
|
82 |
@inproceedings{martin2020camembert,
|
83 |
title={CamemBERT: a Tasty French Language Model},
|
84 |
-
author={Martin, Louis and Muller, Benjamin
|
85 |
booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
|
86 |
year={2020}
|
87 |
}
|
@@ -126,7 +126,7 @@ tokenized_sentence = tokenizer.tokenize("J'aime le camembert !")
|
|
126 |
# 1-hot encode and add special starting and end tokens
|
127 |
encoded_sentence = tokenizer.encode(tokenized_sentence)
|
128 |
# [5, 121, 11, 660, 16, 730, 25543, 110, 83, 6]
|
129 |
-
# NB: Can be done in one step: tokenize.encode("J'aime le camembert !")
|
130 |
|
131 |
# Feed tokens to Camembert as a torch tensor (batch dim 1)
|
132 |
encoded_sentence = torch.tensor(encoded_sentence).unsqueeze(0)
|
@@ -155,3 +155,4 @@ all_layer_embeddings[5]
|
|
155 |
# [ 0.0557, -0.0588, 0.0547, ..., -0.0726, -0.0867, 0.0699],
|
156 |
# ...,
|
157 |
```
|
|
|
|
10 |
## Table of Contents
|
11 |
- [Model Details](#model-details)
|
12 |
- [Uses](#uses)
|
13 |
+
- [Risks, Limitations and Biases](#risks-limitations-and-biases)
|
14 |
- [Training](#training)
|
15 |
- [Evaluation](#evaluation)
|
16 |
- [Citation Information](#citation-information)
|
|
|
20 |
## Model Details
|
21 |
- **Model Description:**
|
22 |
CamemBERT is a state-of-the-art language model for French based on the RoBERTa model.
|
23 |
+
It is now available on Hugging Face in 6 different versions with varying number of parameters, amount of pretraining data and pretraining data source domains.
|
24 |
- **Developed by:** Louis Martin\*, Benjamin Muller\*, Pedro Javier Ortiz Suárez\*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
|
25 |
- **Model Type:** Fill-Mask
|
26 |
- **Language(s):** French
|
|
|
38 |
This model can be used for Fill-Mask tasks.
|
39 |
|
40 |
|
41 |
+
## Risks, Limitations and Biases
|
42 |
**CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.**
|
43 |
|
44 |
Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)).
|
|
|
72 |
## Evaluation
|
73 |
|
74 |
|
75 |
+
The model developers evaluated CamemBERT using four different downstream tasks for French: part-of-speech (POS) tagging, dependency parsing, named entity recognition (NER) and natural language inference (NLI).
|
76 |
|
77 |
|
78 |
|
|
|
81 |
```bibtex
|
82 |
@inproceedings{martin2020camembert,
|
83 |
title={CamemBERT: a Tasty French Language Model},
|
84 |
+
author={Martin, Louis and Muller, Benjamin and Su{\'a}rez, Pedro Javier Ortiz and Dupont, Yoann and Romary, Laurent and de la Clergerie, {\'E}ric Villemonte and Seddah, Djam{\'e} and Sagot, Beno{\^\i}t},
|
85 |
booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
|
86 |
year={2020}
|
87 |
}
|
|
|
126 |
# 1-hot encode and add special starting and end tokens
|
127 |
encoded_sentence = tokenizer.encode(tokenized_sentence)
|
128 |
# [5, 121, 11, 660, 16, 730, 25543, 110, 83, 6]
|
129 |
+
# NB: Can be done in one step : tokenize.encode("J'aime le camembert !")
|
130 |
|
131 |
# Feed tokens to Camembert as a torch tensor (batch dim 1)
|
132 |
encoded_sentence = torch.tensor(encoded_sentence).unsqueeze(0)
|
|
|
155 |
# [ 0.0557, -0.0588, 0.0547, ..., -0.0726, -0.0867, 0.0699],
|
156 |
# ...,
|
157 |
```
|
158 |
+
|