CultriX commited on
Commit
86dcaaa
1 Parent(s): e090456

After #1 on LLM for 7B

Browse files
Files changed (1) hide show
  1. README.md +22 -1
README.md CHANGED
@@ -6,7 +6,28 @@ pipeline_tag: text-generation
6
  dtype: bfloat16
7
  ---
8
 
9
- # DESCRIPTION
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  MistralTrix-v1 is an zyh3826/GML-Mistral-merged-v1 model that has been further fine-tuned with Direct Preference Optimization (DPO) using Intel's dataset for neural-chat-7b-v3-1.
11
  It surpasses the original model on several benchmarks (see results).
12
 
 
6
  dtype: bfloat16
7
  ---
8
 
9
+ # Edit/Disclaimer:
10
+ Currently the #1 ranked 7B LLM on the LLM Leaderboards, woah!
11
+ I did not expect that result at all and am in no way a professional when it comes to LLM's or computer science in general,
12
+ just a guy that likes to nerd about and tinker around.
13
+
14
+ For those wondering how I achieved this, the answer is that I simply attempted to apply the techniques outlined in this amazing article myself: https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac
15
+ Therefore, all credit basically goes to the guy who wrote that.
16
+ He offers the exact Colab notebook I used to train this model for free, as well as a really nice GitHub page I hope he doesn't mind me sharing: https://github.com/mlabonne/llm-course/
17
+ So huge thank you to him for sharing his knowledge and learning me a thing or two in the process!
18
+
19
+ # GGUF
20
+ I attempted to quantisize the model myself, which again I pretty much have no clue about, but it seems to run fine for me when I test them:
21
+ https://huggingface.co/CultriX/MistralTrix-v1-GGUF
22
+
23
+ I'll say it one more time though:
24
+ "I am a complete beginner to all of this, so if these do end up sucking don't be surprised."
25
+
26
+ You have been warned :)
27
+
28
+ # Description:
29
+ (trained on a single Colab GPU in less than a few hours)
30
+
31
  MistralTrix-v1 is an zyh3826/GML-Mistral-merged-v1 model that has been further fine-tuned with Direct Preference Optimization (DPO) using Intel's dataset for neural-chat-7b-v3-1.
32
  It surpasses the original model on several benchmarks (see results).
33