WillHeld
/

DiVA-llama-3-token-align-8b

Model card Files Files and versions Community

WillHeld commited on Oct 8

Commit

861f9b2

•

1 Parent(s): 7a69005

Update README.md

Files changed (1) hide show

README.md +13 -9

README.md CHANGED Viewed

@@ -1,20 +1,24 @@
 # Model Card for Diva Llama 3
 <!-- Provide a quick summary of what the model is/does. [Optional] -->
-This is an end-to-end Voice Assistant Model which can handle speech and text as inputs. It is trained using distillation loss. More details will be in a paper [COMING SOON]!
-See the model in action compared to SALMONN and Qwen-Audio at [diva-audio.github.io](https://diva-audio.github.io).
 ## Citation
-No Publication As of Yet, But If You Use Please Cite the Below
 **BibTeX:**
 ```
-	@misc{held2024diva,
-	  author="Held, Will and Zhang, Yanzhe and Ryan, Michael and Shi, Weiyan and Li, Ella and Yang, Diyi",
-	  title="Distilling an End-to-End Voice Assistant from Speech Recognition Data",
-	  year="2024",
-	  publisher="HuggingFace",
-	}
 ```

 # Model Card for Diva Llama 3
 <!-- Provide a quick summary of what the model is/does. [Optional] -->
+This is an ablation of our Distilled Voice Assistant (DiVA) model which can handle speech and text as inputs. This ablation is trained using only token-alignment loss as described in the ablations here: https://huggingface.co/papers/2410.02678
+Weights and Biases Run: https://wandb.ai/i18nlp/DiVA%20Training%20Runs/runs/4t0mvbcd?nw=nwuserheld
 ## Citation
+This is the token-alignment only model from https://huggingface.co/papers/2410.02678
 **BibTeX:**
 ```
+@misc{DiVA,
+      title={{D}istilling an {E}nd-to-{E}nd {V}oice {A}ssistant {W}ithout {I}nstruction {T}raining {D}ata},
+      author={William Held and Ella Li and Michael Ryan and Weiyan Shi and Yanzhe Zhang and Diyi Yang},
+      year={2024},
+      eprint={2410.02678},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2410.02678},
+}
 ```