Update README.md
Browse files
README.md
CHANGED
@@ -156,9 +156,9 @@ library_name: transformers
|
|
156 |
|
157 |
# Model Card for DeBERTa-v3-base-tasksource-nli
|
158 |
|
159 |
-
DeBERTa
|
160 |
-
You can fine-tune this model to use it for any classification or multiple-choice task
|
161 |
-
This
|
162 |
The untuned model CLS embedding also has strong linear probing performance (90% on MNLI), due to the multitask training.
|
163 |
|
164 |
This is the shared model with the MNLI classifier on top. Its encoder was trained on many datasets including bigbench, Anthropic/hh-rlhf... alongside many NLI and classification tasks with a SequenceClassification heads while using only one shared encoder.
|
@@ -167,12 +167,12 @@ The number of examples per task was capped to 64k. The model was trained for 20k
|
|
167 |
|
168 |
The list of tasks is available in tasks.md
|
169 |
|
170 |
-
code: https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olUbqLBxgQS?usp=sharing
|
171 |
|
172 |
### Software
|
173 |
|
174 |
https://github.com/sileod/tasknet/
|
175 |
-
Training took 7 days on 24GB gpu.
|
176 |
|
177 |
## Model Recycling
|
178 |
An earlier (weaker) version model is ranked 1st among all models with the microsoft/deberta-v3-base architecture as of 10/01/2023
|
|
|
156 |
|
157 |
# Model Card for DeBERTa-v3-base-tasksource-nli
|
158 |
|
159 |
+
DeBERTa-v3-base fine-tuned jointly fine-tuned on 444 tasks of the [tasksource collection](https://github.com/sileod/tasksource/)
|
160 |
+
You can fine-tune this model to use it for any classification or multiple-choice task.
|
161 |
+
This checkpoint has strong zero-shot validation performance on many tasks (e.g. 70% on WNLI).
|
162 |
The untuned model CLS embedding also has strong linear probing performance (90% on MNLI), due to the multitask training.
|
163 |
|
164 |
This is the shared model with the MNLI classifier on top. Its encoder was trained on many datasets including bigbench, Anthropic/hh-rlhf... alongside many NLI and classification tasks with a SequenceClassification heads while using only one shared encoder.
|
|
|
167 |
|
168 |
The list of tasks is available in tasks.md
|
169 |
|
170 |
+
tasksource training code: https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olUbqLBxgQS?usp=sharing
|
171 |
|
172 |
### Software
|
173 |
|
174 |
https://github.com/sileod/tasknet/
|
175 |
+
Training took 7 days on RTX6000 24GB gpu.
|
176 |
|
177 |
## Model Recycling
|
178 |
An earlier (weaker) version model is ranked 1st among all models with the microsoft/deberta-v3-base architecture as of 10/01/2023
|