maddes8cht commited on
Commit
ee5b0fe
·
1 Parent(s): 96fca6c

"Update README.md"

Browse files
Files changed (1) hide show
  1. README.md +12 -8
README.md CHANGED
@@ -198,18 +198,20 @@ prompt = """\
198
 
199
  ### Training Data
200
 
 
 
201
  Claire-7B-0.1 was tuned from Falcon-7b on the following data distribution:
202
 
203
  | **Data type** | **Words** | **Training Sampling Weight** | **Sources** |
204
  |-------------------------------|------------|------------------------------|-----------------------------------------------------|
205
- | Parliamentary Proceedings | 135M | 35% | assemblee-nationale.fr |
206
- | Theatre | 16M | 18% | dracor.org/fre, theatregratuit.com |
207
- | Interviews | 6.4M | 29% | TCOF, CFPP, CFPB, ACSYNT, PFC, Valibel (ORFEO), ESLO |
208
- | Free Conversations | 2.2M | 10% | CRFP, OFROM, CID, Rhapsodie, ParisStories, PFC, CLAPI, C-ORAL-ROM (ORFEO), LinTO, ESLO |
209
- | Meetings | 1.2M | 5% | SUMM-RE, LinTO, Réunions de travail (ORFEO) |
210
- | Debates | 402k | <2% | FreD, ESLO |
211
- | Assistance | 159k | <1% | Fleuron (ORFEO), Accueil UBS, OTG, ESLO |
212
- | Presentation, Formal Address | 86k | <0.5% | Valibel (ORFEO), LinTO, ESLO |
213
 
214
  Training data was augmented with the following techniques:
215
  * varying the format used to indicate speech turns (dashes or [XXX:])
@@ -223,6 +225,8 @@ While the model has been trained and evaluated only on French dialogues, it may
223
 
224
  ### Training Procedure
225
 
 
 
226
  Claire-7B-0.1 is a causal decoder-only model trained on a causal language modeling task (i.e., predict the next token).
227
  See [Falcon-7b](https://huggingface.co/tiiuae/falcon-7b) for more details.
228
 
 
198
 
199
  ### Training Data
200
 
201
+ The training dataset will be made available soon.
202
+
203
  Claire-7B-0.1 was tuned from Falcon-7b on the following data distribution:
204
 
205
  | **Data type** | **Words** | **Training Sampling Weight** | **Sources** |
206
  |-------------------------------|------------|------------------------------|-----------------------------------------------------|
207
+ | Parliamentary Proceedings | 135M | 35% | Assemblée Nationale |
208
+ | Theatre | 16M | 18% | Théâtre Classique, Théâtre Gratuit |
209
+ | Interviews | 6.4M | 29% | TCOF, CFPP, CFPB, ACSYNT, PFC, Valibel (ORFEO), ESLO|
210
+ | Free Conversations | 2.2M | 10% | CRFP (ORFEO), OFROM (ORFEO), CID, Rhapsodie, ParisStories, PFC, CLAPI, C-ORAL-ROM (ORFEO), LinTO, ESLO |
211
+ | Meetings | 1.2M | 5% | SUMM-RE, LinTO, Réunions de travail (ORFEO) |
212
+ | Debates | 402k | <2% | FreDSum, ESLO |
213
+ | Assistance | 159k | <1% | Fleuron (ORFEO), Accueil UBS, OTG, ESLO |
214
+ | Presentation, Formal Address | 86k | <0.5% | Valibel (ORFEO), LinTO, ESLO |
215
 
216
  Training data was augmented with the following techniques:
217
  * varying the format used to indicate speech turns (dashes or [XXX:])
 
225
 
226
  ### Training Procedure
227
 
228
+ The training code will be made available soon.
229
+
230
  Claire-7B-0.1 is a causal decoder-only model trained on a causal language modeling task (i.e., predict the next token).
231
  See [Falcon-7b](https://huggingface.co/tiiuae/falcon-7b) for more details.
232