WillHeld
/

DiVA-llama-3-v0-8b

Feature Extraction

Model card Files Files and versions Community

Helw150 commited on Oct 5, 2024

Commit

bd8b270

·

1 Parent(s): 3999209

Tweaks

Files changed (2) hide show

README.md +17 -17
modeling_diva.py +2 -1

README.md CHANGED Viewed

@@ -12,6 +12,22 @@ This is an end-to-end Voice Assistant Model which can handle speech and text as
 See the model in action at [diva-audio.github.io](https://diva-audio.github.io).
 ### Inference Example
 ```python
 from transformers import AutoModel
@@ -44,22 +60,6 @@ print(
 )
 ```
-## Citation
-**BibTeX:**
-```
-@misc{DiVA,
-      title={{D}istilling an {E}nd-to-{E}nd {V}oice {A}ssistant {W}ithout {I}nstruction {T}raining {D}ata},
-      author={William Held and Ella Li and Michael Ryan and Weiyan Shi and Yanzhe Zhang and Diyi Yang},
-      year={2024},
-      eprint={2410.02678},
-      archivePrefix={arXiv},
-      primaryClass={cs.CL},
-      url={https://arxiv.org/abs/2410.02678},
-}
-```
 ##  Table of Contents
 - [Model Card for DiVA Llama 3](#model-card-for-DiVA-Llama-3)
@@ -114,4 +114,4 @@ Will Held
 ## Model Card Contact
-held@stanford.edu

 See the model in action at [diva-audio.github.io](https://diva-audio.github.io).
+## Citation
+**BibTeX:**
+```
+@misc{DiVA,
+      title={{D}istilling an {E}nd-to-{E}nd {V}oice {A}ssistant {W}ithout {I}nstruction {T}raining {D}ata},
+      author={William Held and Ella Li and Michael Ryan and Weiyan Shi and Yanzhe Zhang and Diyi Yang},
+      year={2024},
+      eprint={2410.02678},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2410.02678},
+}
+```
 ### Inference Example
 ```python
 from transformers import AutoModel
 )
 ```
 ##  Table of Contents
 - [Model Card for DiVA Llama 3](#model-card-for-DiVA-Llama-3)
 ## Model Card Contact
+held@stanford.edu

modeling_diva.py CHANGED Viewed

@@ -263,7 +263,8 @@ class DiVAModel(PreTrainedModel):
             else:
                 greedy = next_token_logits.argmax(dim=-1)
             for token_index, out in enumerate(greedy.flatten().tolist()):
-                outs[token_index].append(out)
                 if out == 128009:
                     complete[token_index] = True

             else:
                 greedy = next_token_logits.argmax(dim=-1)
             for token_index, out in enumerate(greedy.flatten().tolist()):
+                if not complete[token_index]:
+                    outs[token_index].append(out)
                 if out == 128009:
                     complete[token_index] = True