fxtentacle commited on
Commit
40ff292
1 Parent(s): dbeb0d0

add citation

Browse files
Files changed (1) hide show
  1. README.md +16 -0
README.md CHANGED
@@ -1,5 +1,21 @@
1
  This repo contains the fully trained ByT5 that was used to estimate per-character entropies. Using it, you can also recreate the illustration in the paper.
2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ## Generate TEVR Tokenizer from Text corpus
4
  (copy of `Generate TEVR Tokenizer.ipynb`)
5
 
 
1
  This repo contains the fully trained ByT5 that was used to estimate per-character entropies. Using it, you can also recreate the illustration in the paper.
2
 
3
+ ## Citation
4
+
5
+ If you use this for research, please cite:
6
+ ```bibtex
7
+ @misc{https://doi.org/10.48550/arxiv.2206.12693,
8
+ doi = {10.48550/ARXIV.2206.12693},
9
+ url = {https://arxiv.org/abs/2206.12693},
10
+ author = {Krabbenhöft, Hajo Nils and Barth, Erhardt},
11
+ keywords = {Computation and Language (cs.CL), Sound (cs.SD), Audio and Speech Processing (eess.AS), FOS: Computer and information sciences, FOS: Computer and information sciences, FOS: Electrical engineering, electronic engineering, information engineering, FOS: Electrical engineering, electronic engineering, information engineering, F.2.1; I.2.6; I.2.7},
12
+ title = {TEVR: Improving Speech Recognition by Token Entropy Variance Reduction},
13
+ publisher = {arXiv},
14
+ year = {2022},
15
+ copyright = {Creative Commons Attribution 4.0 International}
16
+ }
17
+ ```
18
+
19
  ## Generate TEVR Tokenizer from Text corpus
20
  (copy of `Generate TEVR Tokenizer.ipynb`)
21