rohitdavas commited on
Commit
5384813
1 Parent(s): e8091d7

revert back of grammarly fix on readme.md

Browse files

the last contribution introduced the changes in README.md which were not supposed to be there. Reverting the readme file back to the previous commit.

Files changed (1) hide show
  1. README.md +7 -6
README.md CHANGED
@@ -10,7 +10,7 @@ datasets:
10
  ## Table of Contents
11
  - [Model Details](#model-details)
12
  - [Uses](#uses)
13
- - [Risks, Limitations, and Biases](#risks-limitations-and-biases)
14
  - [Training](#training)
15
  - [Evaluation](#evaluation)
16
  - [Citation Information](#citation-information)
@@ -20,7 +20,7 @@ datasets:
20
  ## Model Details
21
  - **Model Description:**
22
  CamemBERT is a state-of-the-art language model for French based on the RoBERTa model.
23
- It is now available on Hugging Face in 6 different versions with varying numbers of parameters, amount of pretraining data, and pretraining data source domains.
24
  - **Developed by:** Louis Martin\*, Benjamin Muller\*, Pedro Javier Ortiz Suárez\*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
25
  - **Model Type:** Fill-Mask
26
  - **Language(s):** French
@@ -38,7 +38,7 @@ It is now available on Hugging Face in 6 different versions with varying numbers
38
  This model can be used for Fill-Mask tasks.
39
 
40
 
41
- ## Risks, Limitations, and Biases
42
  **CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.**
43
 
44
  Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)).
@@ -72,7 +72,7 @@ OSCAR or Open Super-large Crawled Aggregated coRpus is a multilingual corpus obt
72
  ## Evaluation
73
 
74
 
75
- The model developers evaluated CamemBERT using four different downstream tasks for French: part-of-speech (POS) tagging, dependency parsing, named entity recognition (NER), and natural language inference (NLI).
76
 
77
 
78
 
@@ -81,7 +81,7 @@ The model developers evaluated CamemBERT using four different downstream tasks f
81
  ```bibtex
82
  @inproceedings{martin2020camembert,
83
  title={CamemBERT: a Tasty French Language Model},
84
- author={Martin, Louis and Muller, Benjamin, and Su{\'a}rez, Pedro Javier Ortiz and Dupont, Yoann and Romary, Laurent and de la Clergerie, {\'E}ric Villemonte and Seddah, Djam{\'e} and Sagot, Beno{\^\i}t},
85
  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
86
  year={2020}
87
  }
@@ -126,7 +126,7 @@ tokenized_sentence = tokenizer.tokenize("J'aime le camembert !")
126
  # 1-hot encode and add special starting and end tokens
127
  encoded_sentence = tokenizer.encode(tokenized_sentence)
128
  # [5, 121, 11, 660, 16, 730, 25543, 110, 83, 6]
129
- # NB: Can be done in one step: tokenize.encode("J'aime le camembert !")
130
 
131
  # Feed tokens to Camembert as a torch tensor (batch dim 1)
132
  encoded_sentence = torch.tensor(encoded_sentence).unsqueeze(0)
@@ -155,3 +155,4 @@ all_layer_embeddings[5]
155
  # [ 0.0557, -0.0588, 0.0547, ..., -0.0726, -0.0867, 0.0699],
156
  # ...,
157
  ```
 
 
10
  ## Table of Contents
11
  - [Model Details](#model-details)
12
  - [Uses](#uses)
13
+ - [Risks, Limitations and Biases](#risks-limitations-and-biases)
14
  - [Training](#training)
15
  - [Evaluation](#evaluation)
16
  - [Citation Information](#citation-information)
 
20
  ## Model Details
21
  - **Model Description:**
22
  CamemBERT is a state-of-the-art language model for French based on the RoBERTa model.
23
+ It is now available on Hugging Face in 6 different versions with varying number of parameters, amount of pretraining data and pretraining data source domains.
24
  - **Developed by:** Louis Martin\*, Benjamin Muller\*, Pedro Javier Ortiz Suárez\*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
25
  - **Model Type:** Fill-Mask
26
  - **Language(s):** French
 
38
  This model can be used for Fill-Mask tasks.
39
 
40
 
41
+ ## Risks, Limitations and Biases
42
  **CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.**
43
 
44
  Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)).
 
72
  ## Evaluation
73
 
74
 
75
+ The model developers evaluated CamemBERT using four different downstream tasks for French: part-of-speech (POS) tagging, dependency parsing, named entity recognition (NER) and natural language inference (NLI).
76
 
77
 
78
 
 
81
  ```bibtex
82
  @inproceedings{martin2020camembert,
83
  title={CamemBERT: a Tasty French Language Model},
84
+ author={Martin, Louis and Muller, Benjamin and Su{\'a}rez, Pedro Javier Ortiz and Dupont, Yoann and Romary, Laurent and de la Clergerie, {\'E}ric Villemonte and Seddah, Djam{\'e} and Sagot, Beno{\^\i}t},
85
  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
86
  year={2020}
87
  }
 
126
  # 1-hot encode and add special starting and end tokens
127
  encoded_sentence = tokenizer.encode(tokenized_sentence)
128
  # [5, 121, 11, 660, 16, 730, 25543, 110, 83, 6]
129
+ # NB: Can be done in one step : tokenize.encode("J'aime le camembert !")
130
 
131
  # Feed tokens to Camembert as a torch tensor (batch dim 1)
132
  encoded_sentence = torch.tensor(encoded_sentence).unsqueeze(0)
 
155
  # [ 0.0557, -0.0588, 0.0547, ..., -0.0726, -0.0867, 0.0699],
156
  # ...,
157
  ```
158
+