Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
alvarobartt 
posted an update Dec 27, 2023

Also thanks to @osanseviero for granting me access for the posts private beta! 🦙

Very cool! So this is DPO^2 🔥

·

Yes, indeed for anyone wondering, a bit more into detail, Mixtral-8x7B-Instruct-v0.1 was fine-tuned using SFT + DPO (read more about it at https://mistral.ai/news/mixtral-of-experts/) and we ran the DPO fine-tuning on top of it, but using data from UltraFeedback, in particular argilla/ultrafeedback-binarized-preferences-cleaned, but using a different binarization approach and applying some data cleaning, but essentially following the same approach as @HuggingFaceH4 did with zephyr-7b-beta.

So DPO ^ 2 is fair! 😄

spaces demo?

·

We may need to discuss that internally, but could be something we may consider for opening 2024 🤗

Very cool, thanks for sharing.

How does it perform on MT-Bench / Alpaca-eval?

·

MT-Bench is on par with mistralai/Mixtral-8x7B-Instruct-v0.1 which means ~8.3, and we didn't run AlpacaEval yet

This is great! Unfortunately it does not work on Inference Endpoints, will have to try and find some other way to host the model.

·

Hey sorry to hear that, is it due to the resources required to run it right? Maybe you can ask HuggingFace for a quota increase so that you can allocate the required resources 🤗 Let me know if there's anything I can help you with!

This comment has been hidden