Update README.md
Browse files
README.md
CHANGED
@@ -223,14 +223,14 @@ library_name: transformers
|
|
223 |
|
224 |
# Model Card for DeBERTa-v3-base-tasksource-nli
|
225 |
|
226 |
-
DeBERTa-v3-base fine-tuned with multi-task learning on
|
227 |
You can further fine-tune this model to use it for any classification or multiple-choice task.
|
228 |
This checkpoint has strong zero-shot validation performance on many tasks (e.g. 70% on WNLI).
|
229 |
The untuned model CLS embedding also has strong linear probing performance (90% on MNLI), due to the multitask training.
|
230 |
|
231 |
This is the shared model with the MNLI classifier on top. Its encoder was trained on many datasets including bigbench, Anthropic rlhf, anli... alongside many NLI and classification tasks with a SequenceClassification heads while using only one shared encoder.
|
232 |
Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
|
233 |
-
The number of examples per task was capped to 64k. The model was trained for
|
234 |
|
235 |
The list of tasks is available in tasks.md
|
236 |
|
|
|
223 |
|
224 |
# Model Card for DeBERTa-v3-base-tasksource-nli
|
225 |
|
226 |
+
DeBERTa-v3-base fine-tuned with multi-task learning on 520 tasks of the [tasksource collection](https://github.com/sileod/tasksource/)
|
227 |
You can further fine-tune this model to use it for any classification or multiple-choice task.
|
228 |
This checkpoint has strong zero-shot validation performance on many tasks (e.g. 70% on WNLI).
|
229 |
The untuned model CLS embedding also has strong linear probing performance (90% on MNLI), due to the multitask training.
|
230 |
|
231 |
This is the shared model with the MNLI classifier on top. Its encoder was trained on many datasets including bigbench, Anthropic rlhf, anli... alongside many NLI and classification tasks with a SequenceClassification heads while using only one shared encoder.
|
232 |
Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
|
233 |
+
The number of examples per task was capped to 64k. The model was trained for 45k steps with a batch size of 384, and a peak learning rate of 2e-5.
|
234 |
|
235 |
The list of tasks is available in tasks.md
|
236 |
|