Text2Text Generation
Transformers
Safetensors
English
gazelle
Inference Endpoints
Edit model card

Gazelle v0.2 is the mid-March release from Tincans of a joint speech-language model.

This repo contains an experimental DPO finetune. To our knowledge, this is the first multi-modal DPO finetune of a speech-language model - audio in, text out.

The datasets used were snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset (first iteration) and jondurbin/truthy-dpo-v0.1. We trained for 2 epochs with max_lr=3e-4, batch size 32, 10 warmup steps, cosine decay.

We can see some tell-tale signs of preference modeling at play, particularly longer replies, which don't exist in the base instruction-tuned model. Overall, we view the quality as being mixed and welcome experimentation but do not suggest production use.

Please see this notebook for an inference example.

Downloads last month
43
Safetensors
Model size
7.37B params
Tensor type
BF16
·

Datasets used to train tincans-ai/gazelle-v0.2-dpo

Collection including tincans-ai/gazelle-v0.2-dpo