Amirbek commited on
Commit
9215f51
1 Parent(s): c531393

add task specific tokenizers; add fairseq dictionary for text modality

Browse files
Files changed (3) hide show
  1. asr_spm.model +3 -0
  2. dict.txt +92 -0
  3. tts_spm.model +3 -0
asr_spm.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4f4b56b4bbf0637c59a5c9c8531362bf8ce2bcde4fcd28ce0a0ec26cdf70a905
3
+ size 403344
dict.txt ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ▁ 7888394
2
+ ا 5240837
3
+ ل 4086510
4
+ ي 2961895
5
+ م 2193275
6
+ ن 2167596
7
+ و 1852812
8
+ ت 1552697
9
+ ر 1543221
10
+ ع 1224967
11
+ ب 1116473
12
+ ه 1088784
13
+ د 1060960
14
+ ة 1037790
15
+ أ 940870
16
+ س 935180
17
+ ف 834969
18
+ ك 831374
19
+ ق 807989
20
+ ح 704969
21
+ ج 436806
22
+ ذ 368254
23
+ ط 344915
24
+ إ 327964
25
+ ش 314812
26
+ ى 307987
27
+ ص 290514
28
+ خ 277022
29
+ ض 252120
30
+ ث 197381
31
+ ز 166103
32
+ ئ 137402
33
+ ً 122431
34
+ غ 113416
35
+ ء 111245
36
+ ظ 98346
37
+ ُ 75189
38
+ آ 58959
39
+ ؤ 57561
40
+ ّ 39925
41
+ 0 27260
42
+ ٍ 24728
43
+ َ 22922
44
+ ِ 21587
45
+ 1 18933
46
+ 2 15936
47
+ ٌ 10327
48
+ 5 6938
49
+ 9 6502
50
+ 6 4829
51
+ 7 4380
52
+ 8 4041
53
+ e 3578
54
+ a 3330
55
+ % 2892
56
+ ـ 2768
57
+ t 2746
58
+ i 2683
59
+ n 2462
60
+ o 2390
61
+ r 2341
62
+ s 2073
63
+ c 1561
64
+ l 1504
65
+ m 1141
66
+ d 1018
67
+ p 988
68
+ u 920
69
+ h 888
70
+ g 777
71
+ b 762
72
+ f 603
73
+ y 562
74
+ k 515
75
+ w 470
76
+ v 293
77
+ j 254
78
+ z 193
79
+ x 105
80
+ @ 97
81
+ 3 52
82
+ 4 48
83
+ q 30
84
+ ̇ 6
85
+ ٱ 2
86
+ ⁄ 1
87
+ madeupword0000 0
88
+ madeupword0001 0
89
+ madeupword0002 0
90
+ madeupword0003 0
91
+ madeupword0004 0
92
+ madeupword0005 0
tts_spm.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:17dd0feab129e71329f98fe9efd6143be4f6e1aede2408fe6c586f803f7d6cc0
3
+ size 539226