After #1 on LLM for 7B
Browse files
README.md
CHANGED
@@ -6,7 +6,28 @@ pipeline_tag: text-generation
|
|
6 |
dtype: bfloat16
|
7 |
---
|
8 |
|
9 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
MistralTrix-v1 is an zyh3826/GML-Mistral-merged-v1 model that has been further fine-tuned with Direct Preference Optimization (DPO) using Intel's dataset for neural-chat-7b-v3-1.
|
11 |
It surpasses the original model on several benchmarks (see results).
|
12 |
|
|
|
6 |
dtype: bfloat16
|
7 |
---
|
8 |
|
9 |
+
# Edit/Disclaimer:
|
10 |
+
Currently the #1 ranked 7B LLM on the LLM Leaderboards, woah!
|
11 |
+
I did not expect that result at all and am in no way a professional when it comes to LLM's or computer science in general,
|
12 |
+
just a guy that likes to nerd about and tinker around.
|
13 |
+
|
14 |
+
For those wondering how I achieved this, the answer is that I simply attempted to apply the techniques outlined in this amazing article myself: https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac
|
15 |
+
Therefore, all credit basically goes to the guy who wrote that.
|
16 |
+
He offers the exact Colab notebook I used to train this model for free, as well as a really nice GitHub page I hope he doesn't mind me sharing: https://github.com/mlabonne/llm-course/
|
17 |
+
So huge thank you to him for sharing his knowledge and learning me a thing or two in the process!
|
18 |
+
|
19 |
+
# GGUF
|
20 |
+
I attempted to quantisize the model myself, which again I pretty much have no clue about, but it seems to run fine for me when I test them:
|
21 |
+
https://huggingface.co/CultriX/MistralTrix-v1-GGUF
|
22 |
+
|
23 |
+
I'll say it one more time though:
|
24 |
+
"I am a complete beginner to all of this, so if these do end up sucking don't be surprised."
|
25 |
+
|
26 |
+
You have been warned :)
|
27 |
+
|
28 |
+
# Description:
|
29 |
+
(trained on a single Colab GPU in less than a few hours)
|
30 |
+
|
31 |
MistralTrix-v1 is an zyh3826/GML-Mistral-merged-v1 model that has been further fine-tuned with Direct Preference Optimization (DPO) using Intel's dataset for neural-chat-7b-v3-1.
|
32 |
It surpasses the original model on several benchmarks (see results).
|
33 |
|