dvilasuero HF staff commited on
Commit
a954f23
1 Parent(s): 86c89d1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -23
README.md CHANGED
@@ -1,41 +1,60 @@
1
  ---
2
- license: apache-2.0
 
 
 
3
  base_model: mistralai/Mixtral-8x7B-Instruct-v0.1
 
 
4
  tags:
5
- - generated_from_trainer
 
 
 
 
6
  model-index:
7
- - name: notux-8x7b-v1-alt
8
  results: []
9
  ---
10
 
11
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
- should probably proofread and complete it, then remove this comment. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
- # notux-8x7b-v1-alt
15
 
16
- This model is a fine-tuned version of [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) on the None dataset.
17
- It achieves the following results on the evaluation set:
18
- - Loss: 0.4217
19
- - Rewards/chosen: -0.1933
20
- - Rewards/rejected: -2.2968
21
- - Rewards/accuracies: 0.8135
22
- - Rewards/margins: 2.1035
23
- - Logps/rejected: -409.3196
24
- - Logps/chosen: -396.5202
25
- - Logits/rejected: -1.2925
26
- - Logits/chosen: -1.2132
27
 
28
- ## Model description
29
 
30
- More information needed
31
 
32
- ## Intended uses & limitations
33
 
34
- More information needed
35
 
36
- ## Training and evaluation data
37
 
38
- More information needed
39
 
40
  ## Training procedure
41
 
 
1
  ---
2
+ datasets:
3
+ - argilla/ultrafeedback-binarized-preferences-cleaned
4
+ language:
5
+ - en
6
  base_model: mistralai/Mixtral-8x7B-Instruct-v0.1
7
+ library_name: transformers
8
+ pipeline_tag: text-generation
9
  tags:
10
+ - dpo
11
+ - rlaif
12
+ - preference
13
+ - ultrafeedback
14
+ license: apache-2.0
15
  model-index:
16
+ - name: notux-8x7b-v1
17
  results: []
18
  ---
19
 
20
+ <div align="center">
21
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/60f0608166e5701b80ed3f02/dj-spsk9eXMMXVGxK6jRz.png" alt="A banner representing Notus, the wind god of the south, in a mythical and artistic style. The banner features a strong, swirling breeze, embodying the warm, wet character of the southern wind. Gracefully flowing across the scene are several paper planes, caught in the gentle yet powerful gusts of Notus. The background is a blend of warm colors, symbolizing the heat of the south, with hints of blue and green to represent the moisture carried by this wind. The overall atmosphere is one of dynamic movement and warmth."/>
22
+ </div>
23
+
24
+
25
+ # Model Card for Notux 8x7B-v1
26
+ This model is a preference-tuned version of [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) on the [argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/argilla/ultrafeedback-binarized-preferences-cleaned) dataset using DPO (Direct Preference Optimization)..
27
+
28
+ This is part of the Notus family of models and experiments, where the Argilla team investigates data-first and preference tuning methods like dDPO (distilled DPO). This model is the result of our first experiment at tuning a MoE model that has already been fine-tuned with DPO (i.e., Mixtral-8x7B-Instruct-v0.1).
29
+
30
+ As of 26th Dec, it outperforms its base model `Mixtral-8x7B-Instruct-v0.1` and it's the top ranked MoE (Mixture of Experts) model on the Hugging Face Open LLM Leaderboard.
31
+
32
+ ## Model Details
33
+
34
+ ### Model Description
35
+
36
+ - **Developed by:** Argilla (based on HuggingFace H4 and MistralAI previous efforts)
37
+ - **Shared by:** Argilla
38
+ - **Model type:** Pretrained generative Sparse Mixture of Experts
39
+ - **Language(s) (NLP):** Mainly English
40
+ - **License:** MIT
41
+ - **Finetuned from model:** [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
42
 
43
+ ### Model Sources
44
 
45
+ - **Repository:** https://github.com/argilla-io/notus
46
+ - **Paper:** N/A
 
 
 
 
 
 
 
 
 
47
 
48
+ ## Training Details
49
 
50
+ ### Training Hardware
51
 
52
+ We used a VM with 8 x H100 40GB hosted in runpod.io for 1 epoch (~10hr)
53
 
54
+ ### Training Data
55
 
56
+ We used a new iteration of the Argilla UltraFeedback preferences dataset named [argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/argilla/ultrafeedback-binarized-preferences-cleaned).
57
 
 
58
 
59
  ## Training procedure
60