Update README.md
Browse files
README.md
CHANGED
@@ -1,20 +1,24 @@
|
|
1 |
# Model Card for Diva Llama 3
|
2 |
|
3 |
<!-- Provide a quick summary of what the model is/does. [Optional] -->
|
4 |
-
This is an
|
|
|
|
|
5 |
|
6 |
-
See the model in action compared to SALMONN and Qwen-Audio at [diva-audio.github.io](https://diva-audio.github.io).
|
7 |
## Citation
|
8 |
-
|
9 |
**BibTeX:**
|
10 |
|
11 |
```
|
12 |
-
|
13 |
-
|
14 |
-
|
15 |
-
|
16 |
-
|
17 |
-
|
|
|
|
|
|
|
18 |
|
19 |
```
|
20 |
|
|
|
1 |
# Model Card for Diva Llama 3
|
2 |
|
3 |
<!-- Provide a quick summary of what the model is/does. [Optional] -->
|
4 |
+
This is an ablation of our Distilled Voice Assistant (DiVA) model which can handle speech and text as inputs. This ablation is trained using only token-alignment loss as described in the ablations here: https://huggingface.co/papers/2410.02678
|
5 |
+
|
6 |
+
Weights and Biases Run: https://wandb.ai/i18nlp/DiVA%20Training%20Runs/runs/4t0mvbcd?nw=nwuserheld
|
7 |
|
|
|
8 |
## Citation
|
9 |
+
This is the token-alignment only model from https://huggingface.co/papers/2410.02678
|
10 |
**BibTeX:**
|
11 |
|
12 |
```
|
13 |
+
@misc{DiVA,
|
14 |
+
title={{D}istilling an {E}nd-to-{E}nd {V}oice {A}ssistant {W}ithout {I}nstruction {T}raining {D}ata},
|
15 |
+
author={William Held and Ella Li and Michael Ryan and Weiyan Shi and Yanzhe Zhang and Diyi Yang},
|
16 |
+
year={2024},
|
17 |
+
eprint={2410.02678},
|
18 |
+
archivePrefix={arXiv},
|
19 |
+
primaryClass={cs.CL},
|
20 |
+
url={https://arxiv.org/abs/2410.02678},
|
21 |
+
}
|
22 |
|
23 |
```
|
24 |
|