Text2Text Generation
Transformers
Safetensors
English
gazelle
Inference Endpoints
gazelle-v0.2-dpo / README.md
hingeloss's picture
Update README.md
9a71b08 verified
metadata
license: apache-2.0
language:
  - en
datasets:
  - jondurbin/truthy-dpo-v0.1
  - snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset

Gazelle v0.2 is the mid-March release from Tincans of a joint speech-language model.

This repo contains an experimental DPO finetune. To our knowledge, this is the first multi-modal DPO finetune of a speech-language model - audio in, text out.

The datasets used were snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset (first iteration) and jondurbin/truthy-dpo-v0.1. We trained for 2 epochs with max_lr=3e-4, batch size 32, 10 warmup steps, cosine decay.

We can see some tell-tale signs of preference modeling at play, particularly longer replies, which don't exist in the base instruction-tuned model. Overall, we view the quality as being mixed and welcome experimentation but do not suggest production use.

Please see this notebook for an inference example.