Update README.md
Browse files
README.md
CHANGED
@@ -351,14 +351,15 @@ This model ranked 1st among all models with the microsoft/deberta-v3-base archit
|
|
351 |
https://ibm.github.io/model-recycling/
|
352 |
|
353 |
### Software and training details
|
|
|
|
|
|
|
|
|
|
|
354 |
https://github.com/sileod/tasksource/ \
|
355 |
https://github.com/sileod/tasknet/ \
|
356 |
Training code: https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olUbqLBxgQS?usp=sharing
|
357 |
|
358 |
-
|
359 |
-
This is the shared model with the MNLI classifier on top. Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
|
360 |
-
The number of examples per task was capped to 64k. The model was trained for 200k steps with a batch size of 384, and a peak learning rate of 2e-5. Training took 12 days on Nvidia A30 24GB gpu.
|
361 |
-
|
362 |
# Citation
|
363 |
|
364 |
More details on this [article:](https://arxiv.org/abs/2301.05948)
|
|
|
351 |
https://ibm.github.io/model-recycling/
|
352 |
|
353 |
### Software and training details
|
354 |
+
|
355 |
+
The model was trained on 600 tasks for 200k steps with a batch size of 384 and a peak learning rate of 2e-5. Training took 12 days on Nvidia A30 24GB gpu.
|
356 |
+
This is the shared model with the MNLI classifier on top. Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
|
357 |
+
|
358 |
+
|
359 |
https://github.com/sileod/tasksource/ \
|
360 |
https://github.com/sileod/tasknet/ \
|
361 |
Training code: https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olUbqLBxgQS?usp=sharing
|
362 |
|
|
|
|
|
|
|
|
|
363 |
# Citation
|
364 |
|
365 |
More details on this [article:](https://arxiv.org/abs/2301.05948)
|