migtissera
/

Tess-v2.5-Qwen2-72B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

migtissera commited on Jun 12

Commit

5f94720

•

1 Parent(s): b86bf2c

Update README.md

Files changed (1) hide show

README.md +10 -2

README.md CHANGED Viewed

@@ -13,9 +13,17 @@ We've created Tess-v2.5, the latest state-of-the-art model in the Tess series of
 Tess-v2.5 (Qwen2-72B) was fine-tuned over the newly released Qwen2-72B base, using the Tess-v2.5 dataset that contain 300K samples spanning multiple topics, including business and management, marketing, history, social sciences, arts, STEM subjects and computer programming. This dataset was synthetically generated using the [Sensei](https://github.com/migtissera/Sensei) framework, using multiple frontier models such as GPT-4-Turbo, Claude-Opus and Mistral-Large.
-The compute for this model was generously sponsored by [KindoAI](kindo.ai).
-# Evaluation
 ## MMLU (Massive Multitask Language Understanding)
 ![MMLU_open](https://huggingface.co/migtissera/Tess-v2.5-Qwen2-72B/resolve/main/Figures/mmlu_open_models.png)

 Tess-v2.5 (Qwen2-72B) was fine-tuned over the newly released Qwen2-72B base, using the Tess-v2.5 dataset that contain 300K samples spanning multiple topics, including business and management, marketing, history, social sciences, arts, STEM subjects and computer programming. This dataset was synthetically generated using the [Sensei](https://github.com/migtissera/Sensei) framework, using multiple frontier models such as GPT-4-Turbo, Claude-Opus and Mistral-Large.
+The compute for this model was generously sponsored by [KindoAI](https://kindo.ai).
+When evaluated on a subset of AGIEval (Nous), this model compares very well with the godfather GPT-4-0314 model as well.
+# Training Process
+Tess-v2.5 model was initiated with the base weights of Qwen2-72B. Then it was fine-tuned with the Tess-v2.5 dataset, using Axolotl as the training framework. Most of Tess models follow a common fine-tuning methodology: low learning rates, low number of epochs, and uses very high quality and diverse data. This model was fine-tuned on a 4xA100 VM on Microsoft Azure for 4 days. The model has not been aligned with RLHF or DPO.
+The author believes that model's capabilities seem to come primariliy from the pre-training process. This is the foundation for every fine-tune of Tess models.
+# Evaluation Results
 ## MMLU (Massive Multitask Language Understanding)
 ![MMLU_open](https://huggingface.co/migtissera/Tess-v2.5-Qwen2-72B/resolve/main/Figures/mmlu_open_models.png)