Text2Text Generation
Transformers
Safetensors
English
gazelle
Inference Endpoints
hingeloss commited on
Commit
9a71b08
1 Parent(s): 63c6be8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -2
README.md CHANGED
@@ -2,13 +2,16 @@
2
  license: apache-2.0
3
  language:
4
  - en
 
 
 
5
  ---
6
  Gazelle v0.2 is the mid-March release from [Tincans](https://tincans.ai) of a joint speech-language model.
7
 
8
  This repo contains an experimental DPO finetune. To our knowledge, this is the first multi-modal DPO finetune of a speech-language model - audio in, text out.
9
 
10
- The datasets used were [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset) and [jondurbin/truthy-dpo-v0.1](https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1?row=0). We trained for 2 epochs with lr=3e-4, batch size 32, 10 warmup steps, cosine decay.
11
 
12
  We can see some tell-tale signs of preference modeling at play, particularly longer replies, which don't exist in the base instruction-tuned model. Overall, we view the quality as being mixed and welcome experimentation but do not suggest production use.
13
 
14
- Please see [this notebook](https://github.com/tincans-ai/gazelle/blob/2939d7034277506171d61a7a1001f535426faa71/examples/infer.ipynb) for an inference example.
 
2
  license: apache-2.0
3
  language:
4
  - en
5
+ datasets:
6
+ - jondurbin/truthy-dpo-v0.1
7
+ - snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset
8
  ---
9
  Gazelle v0.2 is the mid-March release from [Tincans](https://tincans.ai) of a joint speech-language model.
10
 
11
  This repo contains an experimental DPO finetune. To our knowledge, this is the first multi-modal DPO finetune of a speech-language model - audio in, text out.
12
 
13
+ The datasets used were [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset) (first iteration) and [jondurbin/truthy-dpo-v0.1](https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1?row=0). We trained for 2 epochs with max_lr=3e-4, batch size 32, 10 warmup steps, cosine decay.
14
 
15
  We can see some tell-tale signs of preference modeling at play, particularly longer replies, which don't exist in the base instruction-tuned model. Overall, we view the quality as being mixed and welcome experimentation but do not suggest production use.
16
 
17
+ Please see [this notebook](https://github.com/tincans-ai/gazelle/blob/2939d7034277506171d61a7a1001f535426faa71/examples/infer.ipynb) for an inference example.