migtissera
commited on
Commit
•
cefc640
1
Parent(s):
5f94720
Update README.md
Browse files
README.md
CHANGED
@@ -19,7 +19,7 @@ When evaluated on a subset of AGIEval (Nous), this model compares very well with
|
|
19 |
|
20 |
# Training Process
|
21 |
|
22 |
-
Tess-v2.5 model was initiated with the base weights of Qwen2-72B.
|
23 |
|
24 |
The author believes that model's capabilities seem to come primariliy from the pre-training process. This is the foundation for every fine-tune of Tess models.
|
25 |
|
|
|
19 |
|
20 |
# Training Process
|
21 |
|
22 |
+
Tess-v2.5 model was initiated with the base weights of Qwen2-72B. It was then fine-tuned with the Tess-v2.5 dataset, using Axolotl as the training framework. Most of Tess models follow a common fine-tuning methodology: low learning rates, low number of epochs, and uses very high quality and diverse data. This model was fine-tuned on a 4xA100 VM on Microsoft Azure for 4 days. The model has not been aligned with RLHF or DPO.
|
23 |
|
24 |
The author believes that model's capabilities seem to come primariliy from the pre-training process. This is the foundation for every fine-tune of Tess models.
|
25 |
|