Update README.md
Browse files
README.md
CHANGED
@@ -161,7 +161,7 @@ You can further fine-tune this model to use it for any classification or multipl
|
|
161 |
This checkpoint has strong zero-shot validation performance on many tasks (e.g. 70% on WNLI).
|
162 |
The untuned model CLS embedding also has strong linear probing performance (90% on MNLI), due to the multitask training.
|
163 |
|
164 |
-
This is the shared model with the MNLI classifier on top. Its encoder was trained on many datasets including bigbench, Anthropic
|
165 |
Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
|
166 |
The number of examples per task was capped to 64k. The model was trained for 20k steps with a batch size of 384, and a peak learning rate of 2e-5.
|
167 |
|
@@ -220,7 +220,7 @@ class MultiTask(transformers.DebertaV2ForMultipleChoice):
|
|
220 |
|
221 |
model = MultiTask.from_pretrained("sileod/deberta-v3-base-tasksource-nli",ignore_mismatched_sizes=True)
|
222 |
task_index = {k:v for v,k in dict(enumerate(model.config.tasks)).items()}[TASK_NAME]
|
223 |
-
model.classifier = model.classifiers[task_index] # model is ready for $TASK_NAME !
|
224 |
```
|
225 |
|
226 |
|
|
|
161 |
This checkpoint has strong zero-shot validation performance on many tasks (e.g. 70% on WNLI).
|
162 |
The untuned model CLS embedding also has strong linear probing performance (90% on MNLI), due to the multitask training.
|
163 |
|
164 |
+
This is the shared model with the MNLI classifier on top. Its encoder was trained on many datasets including bigbench, Anthropic rlhf, anli... alongside many NLI and classification tasks with a SequenceClassification heads while using only one shared encoder.
|
165 |
Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
|
166 |
The number of examples per task was capped to 64k. The model was trained for 20k steps with a batch size of 384, and a peak learning rate of 2e-5.
|
167 |
|
|
|
220 |
|
221 |
model = MultiTask.from_pretrained("sileod/deberta-v3-base-tasksource-nli",ignore_mismatched_sizes=True)
|
222 |
task_index = {k:v for v,k in dict(enumerate(model.config.tasks)).items()}[TASK_NAME]
|
223 |
+
model.classifier = model.classifiers[task_index] # model is ready for $TASK_NAME ! (RLHF) !
|
224 |
```
|
225 |
|
226 |
|