argilla
/

notux-8x7b-v1

@@ -1,41 +1,60 @@
 ---
-license: apache-2.0
 base_model: mistralai/Mixtral-8x7B-Instruct-v0.1
 tags:
-- generated_from_trainer
 model-index:
-- name: notux-8x7b-v1-alt
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# notux-8x7b-v1-alt
-This model is a fine-tuned version of [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) on the None dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.4217
-- Rewards/chosen: -0.1933
-- Rewards/rejected: -2.2968
-- Rewards/accuracies: 0.8135
-- Rewards/margins: 2.1035
-- Logps/rejected: -409.3196
-- Logps/chosen: -396.5202
-- Logits/rejected: -1.2925
-- Logits/chosen: -1.2132
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure

 ---
+datasets:
+- argilla/ultrafeedback-binarized-preferences-cleaned
+language:
+- en
 base_model: mistralai/Mixtral-8x7B-Instruct-v0.1
+library_name: transformers
+pipeline_tag: text-generation
 tags:
+- dpo
+- rlaif
+- preference
+- ultrafeedback
+license: apache-2.0
 model-index:
+- name: notux-8x7b-v1
   results: []
 ---
+<div align="center">
+  <img src="https://cdn-uploads.huggingface.co/production/uploads/60f0608166e5701b80ed3f02/dj-spsk9eXMMXVGxK6jRz.png" alt="A banner representing Notus, the wind god of the south, in a mythical and artistic style. The banner features a strong, swirling breeze, embodying the warm, wet character of the southern wind. Gracefully flowing across the scene are several paper planes, caught in the gentle yet powerful gusts of Notus. The background is a blend of warm colors, symbolizing the heat of the south, with hints of blue and green to represent the moisture carried by this wind. The overall atmosphere is one of dynamic movement and warmth."/>
+</div>
+# Model Card for Notux 8x7B-v1
+This model is a preference-tuned version of [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) on the [argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/argilla/ultrafeedback-binarized-preferences-cleaned) dataset using DPO (Direct Preference Optimization)..
+This is part of the Notus family of models and experiments, where the Argilla team investigates data-first and preference tuning methods like dDPO (distilled DPO). This model is the result of our first experiment at tuning a MoE model that has already been fine-tuned with DPO (i.e., Mixtral-8x7B-Instruct-v0.1).
+As of 26th Dec, it outperforms its base model `Mixtral-8x7B-Instruct-v0.1` and it's the top ranked MoE (Mixture of Experts) model on the Hugging Face Open LLM Leaderboard.
+## Model Details
+### Model Description
+- **Developed by:** Argilla (based on HuggingFace H4 and MistralAI previous efforts)
+- **Shared by:** Argilla
+- **Model type:** Pretrained generative Sparse Mixture of Experts
+- **Language(s) (NLP):** Mainly English
+- **License:** MIT
+- **Finetuned from model:** [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
+### Model Sources
+- **Repository:** https://github.com/argilla-io/notus
+- **Paper:** N/A
+## Training Details
+### Training Hardware
+We used a VM with 8 x H100 40GB hosted in runpod.io for 1 epoch (~10hr)
+### Training Data
+We used a new iteration of the Argilla UltraFeedback preferences dataset named [argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/argilla/ultrafeedback-binarized-preferences-cleaned).
 ## Training procedure