Jeronymous commited on
Commit
82c9573
1 Parent(s): 1d10e82

Add links to dataset and code

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -142,7 +142,7 @@ prompt = """\
142
 
143
  ### Training Data
144
 
145
- The training dataset will be made available soon.
146
 
147
  Claire-7B-Apache-0.1 was tuned from Falcon-7b on the following data distribution:
148
 
@@ -151,7 +151,7 @@ Claire-7B-Apache-0.1 was tuned from Falcon-7b on the following data distribution
151
  | Parliamentary Proceedings | 135M | 54% | Assemblée Nationale |
152
  | Theatre | 2.7M | 23% | Théâtre Gratuit |
153
  | Meetings | 1.0M | 16.6% | SUMM-RE, LinTO |
154
- | Debates | 326k | 5.4% | FreDSum |
155
  | Presentations, Conversations | 58k | 1% | LinTO |
156
 
157
  Training data was augmented with the following techniques:
@@ -165,7 +165,7 @@ While the model has been trained and evaluated only on French dialogues, it may
165
 
166
  ### Training Procedure
167
 
168
- The training code will be made available soon.
169
 
170
  Claire-7B-Apache-0.1 is a causal decoder-only model trained on a causal language modeling task (i.e., predict the next token).
171
  See [Falcon-7b](https://huggingface.co/tiiuae/falcon-7b) for more details.
 
142
 
143
  ### Training Data
144
 
145
+ The training dataset is available at [OpenLLM-France/Claire-Dialogue-French-0.1](https://huggingface.co/datasets/OpenLLM-France/Claire-Dialogue-French-0.1).
146
 
147
  Claire-7B-Apache-0.1 was tuned from Falcon-7b on the following data distribution:
148
 
 
151
  | Parliamentary Proceedings | 135M | 54% | Assemblée Nationale |
152
  | Theatre | 2.7M | 23% | Théâtre Gratuit |
153
  | Meetings | 1.0M | 16.6% | SUMM-RE, LinTO |
154
+ | Debates | 326k | 5.4% | FREDSum |
155
  | Presentations, Conversations | 58k | 1% | LinTO |
156
 
157
  Training data was augmented with the following techniques:
 
165
 
166
  ### Training Procedure
167
 
168
+ The training code is available at [https://github.com/OpenLLM-France/Lit-Claire](https://github.com/OpenLLM-France/Lit-Claire).
169
 
170
  Claire-7B-Apache-0.1 is a causal decoder-only model trained on a causal language modeling task (i.e., predict the next token).
171
  See [Falcon-7b](https://huggingface.co/tiiuae/falcon-7b) for more details.