Alvaro Bartolome

alvarobartt

AI & ML interests

ml @argilla / open-source and machine learning

Articles

Organizations

alvarobartt's activity

posted an update 17 days ago
view post
Post
2046
🦫 We have just released argilla/Capybara-Preferences in collaboration with Kaist AI ( @JW17 , @nlee-208 ) and Hugging Face ( @lewtun )

A new synthetic preference dataset built using distilabel on top of the awesome LDJnr/Capybara from @LDJnr

The current dataset combines the already generated alternative completions from argilla/distilabel-capybara-dpo-7k-binarized, while also adding the remaining ones using the same approach!

Here are some key features on how we built it:

- 🧹 Duplicate removal, keeping the conversation besides the last assistant response, and some slight pre-processing

- 🤖 Generation of alternative completions for the existing conversations (last turn only) with: mlabonne/NeuralBeagle14-7B, argilla/notus-7b-v1, and teknium/OpenHermes-2.5-Mistral-7B

- 👨🏻‍🏫 Running UltraFeedback via GPT-4 to generate the critique i.e. ratings and rationales, for the last assistant responses

- 🎉 Finally, we selected the chosen and rejected responses based on their UltraFeedback score, and applied some slight post-processing!

Sounds simple right? Start building your own synthetic datasets with https://github.com/argilla-io/distilabel already!
replied to davanstrien's post 2 months ago
replied to their post 4 months ago
view reply

Fair, maybe you can go ahead and merge the PEFT adapter into the model, as otherwise you may be getting those OOM, anyway, feel free to report that in the community tab.

Also w.r.t. prompting, the original model was fine-tuned for instruction following, but concatenating the messages can be used for chat applications, so yes, after the system and user prompt the assistants response goes there and then the upcoming turns are appended to that string using the same [INST] notation.

replied to their post 4 months ago
view reply

Indeed we’re serving this via TGI, so that doesn’t seem to be the issue, with Inference Endpoint do you mean HuggingFace’s service right? Could you let me know which GPUs are you using?

replied to their post 4 months ago
view reply

Hey sorry to hear that, is it due to the resources required to run it right? Maybe you can ask HuggingFace for a quota increase so that you can allocate the required resources 🤗 Let me know if there's anything I can help you with!

replied to abhishek's post 4 months ago
replied to abhishek's post 4 months ago
replied to abhishek's post 4 months ago
replied to KnutJaegersberg's post 4 months ago
view reply

In fact, to add more context, the authors mentioned that they will release some more content in the upcoming revision of the paper which is nice, because that would imply that anyone could run a faithful reproduction of their synthetic data generation process. See the reply from the authors at https://huggingface.co/papers/2401.00368#65978d195f689f3f0b2caeb9.

Also worth mentioning that @andersonbcdefg ran both stages:

(Unsure if the reproduction of the second stage is faithful to the original, but asked them at https://twitter.com/alvarobartt/status/1742839431881490717, anyway I think we may need to wait for the authors to share the full details on the prompting strategies for the generation).

posted an update 4 months ago
replied to their post 4 months ago
replied to their post 4 months ago
view reply

We may need to discuss that internally, but could be something we may consider for opening 2024 🤗

replied to their post 4 months ago
view reply

Yes, indeed for anyone wondering, a bit more into detail, Mixtral-8x7B-Instruct-v0.1 was fine-tuned using SFT + DPO (read more about it at https://mistral.ai/news/mixtral-of-experts/) and we ran the DPO fine-tuning on top of it, but using data from UltraFeedback, in particular argilla/ultrafeedback-binarized-preferences-cleaned, but using a different binarization approach and applying some data cleaning, but essentially following the same approach as @HuggingFaceH4 did with zephyr-7b-beta.

So DPO ^ 2 is fair! 😄

replied to their post 4 months ago
posted an update 4 months ago