Aditya02 commited on
Commit
3609a3c
1 Parent(s): 47066ef

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +165 -549
README.md CHANGED
@@ -1,550 +1,166 @@
1
- <!DOCTYPE html>
2
- <html class="">
3
- <head>
4
- <meta charset="utf-8" />
5
- <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=no" />
6
- <meta name="description" content="We’re on a journey to advance and democratize artificial intelligence through open source and open science." />
7
- <meta property="fb:app_id" content="1321688464574422" />
8
- <meta name="twitter:card" content="summary_large_image" />
9
- <meta name="twitter:site" content="@huggingface" />
10
- <meta property="og:title" content="README.md · nvidia/stt_en_citrinet_1024_ls at main" />
11
- <meta property="og:type" content="website" />
12
- <meta property="og:url" content="https://huggingface.co/nvidia/stt_en_citrinet_1024_ls/blob/main/README.md" />
13
- <meta property="og:image" content="https://thumbnails.huggingface.co/social-thumbnails/models/nvidia/stt_en_citrinet_1024_ls.png" />
14
-
15
- <link rel="stylesheet" href="/front/build/style.c5ff23a02.css" />
16
-
17
- <link rel="preconnect" href="https://fonts.gstatic.com" />
18
- <link
19
- href="https://fonts.googleapis.com/css2?family=Source+Sans+Pro:ital,wght@0,200;0,300;0,400;0,600;0,700;0,900;1,200;1,300;1,400;1,600;1,700;1,900&display=swap"
20
- rel="stylesheet"
21
- />
22
- <link
23
- href="https://fonts.googleapis.com/css2?family=IBM+Plex+Mono:wght@400;600;700&display=swap"
24
- rel="stylesheet"
25
- />
26
- <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.12.0/dist/katex.min.css" />
27
-
28
-
29
-
30
- <title>README.md · nvidia/stt_en_citrinet_1024_ls at main</title>
31
- </head>
32
- <body class="flex flex-col min-h-screen bg-white dark:bg-gray-950 text-black ViewerBlobPage">
33
- <div class="flex flex-col min-h-screen "><div class="SVELTE_HYDRATER contents" data-props="{&quot;avatarUrl&quot;:&quot;/avatars/08cce7993292a724b1441d524fdc3767.svg&quot;,&quot;hfCloudName&quot;:&quot;private&quot;,&quot;isAuth&quot;:true,&quot;isHfCloud&quot;:false,&quot;isWide&quot;:false,&quot;user&quot;:&quot;Aditya02&quot;,&quot;unreadNotifications&quot;:0,&quot;csrf&quot;:&quot;eyJkYXRhIjp7ImV4cGlyYXRpb24iOjE2NzUxODA2MzQ5ODIsInVzZXJJZCI6IjYzOGY1MGU3ZjZkZTRiOWU3ZTE1ZTI4NSJ9LCJzaWduYXR1cmUiOiIxZTQxYTdiMDA3Yzg1NDgyZmEzMzY2NGM4ZjYyNDcxNDg0MGViZGM4NDVjNDUzZTRiNGJjZTc5MTBjYmYyYjU5In0=&quot;}" data-target="MainHeader"><header class="border-b border-gray-100"><div class="w-full px-4 lg:px-6 xl:container flex items-center h-16"><div class="flex flex-1 items-center"><a class="flex flex-none items-center mr-5 lg:mr-6" href="/"><img alt="Hugging Face's logo" class="md:mr-2 w-7" src="/front/assets/huggingface_logo-noborder.svg">
34
- <span class="hidden text-lg font-bold whitespace-nowrap md:block">Hugging Face</span></a>
35
-
36
- <div class="relative flex-1 lg:max-w-sm mr-2 sm:mr-4 lg:mr-6"><input autocomplete="off" class="w-full dark:bg-gray-950 pl-8 form-input-alt h-9 pr-3 focus:shadow-xl" name="" placeholder="Search models, datasets, users..." spellcheck="false" type="text" value="">
37
- <svg class="absolute left-2.5 text-gray-400 top-1/2 transform -translate-y-1/2" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M30 28.59L22.45 21A11 11 0 1 0 21 22.45L28.59 30zM5 14a9 9 0 1 1 9 9a9 9 0 0 1-9-9z" fill="currentColor"></path></svg>
38
- </div>
39
- <button class="lg:hidden relative flex-none place-self-stretch flex items-center justify-center w-8" type="button"><svg width="1em" height="1em" viewBox="0 0 10 10" class="text-xl" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" preserveAspectRatio="xMidYMid meet" fill="currentColor"><path fill-rule="evenodd" clip-rule="evenodd" d="M1.65039 2.9999C1.65039 2.8066 1.80709 2.6499 2.00039 2.6499H8.00039C8.19369 2.6499 8.35039 2.8066 8.35039 2.9999C8.35039 3.1932 8.19369 3.3499 8.00039 3.3499H2.00039C1.80709 3.3499 1.65039 3.1932 1.65039 2.9999ZM1.65039 4.9999C1.65039 4.8066 1.80709 4.6499 2.00039 4.6499H8.00039C8.19369 4.6499 8.35039 4.8066 8.35039 4.9999C8.35039 5.1932 8.19369 5.3499 8.00039 5.3499H2.00039C1.80709 5.3499 1.65039 5.1932 1.65039 4.9999ZM2.00039 6.6499C1.80709 6.6499 1.65039 6.8066 1.65039 6.9999C1.65039 7.1932 1.80709 7.3499 2.00039 7.3499H8.00039C8.19369 7.3499 8.35039 7.1932 8.35039 6.9999C8.35039 6.8066 8.19369 6.6499 8.00039 6.6499H2.00039Z"></path></svg>
40
- </button>
41
-
42
- </div>
43
- <nav aria-label="Main" class="ml-auto hidden lg:block"><ul class="flex items-center space-x-2"><li><a class="flex items-center group px-2 py-0.5 dark:hover:text-gray-400 hover:text-indigo-700" href="/models"><svg class="mr-1.5 text-gray-400 group-hover:text-indigo-500" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 24 24"><path class="uim-quaternary" d="M20.23 7.24L12 12L3.77 7.24a1.98 1.98 0 0 1 .7-.71L11 2.76c.62-.35 1.38-.35 2 0l6.53 3.77c.29.173.531.418.7.71z" opacity=".25" fill="currentColor"></path><path class="uim-tertiary" d="M12 12v9.5a2.09 2.09 0 0 1-.91-.21L4.5 17.48a2.003 2.003 0 0 1-1-1.73v-7.5a2.06 2.06 0 0 1 .27-1.01L12 12z" opacity=".5" fill="currentColor"></path><path class="uim-primary" d="M20.5 8.25v7.5a2.003 2.003 0 0 1-1 1.73l-6.62 3.82c-.275.13-.576.198-.88.2V12l8.23-4.76c.175.308.268.656.27 1.01z" fill="currentColor"></path></svg>
44
- Models</a>
45
- </li><li><a class="flex items-center group px-2 py-0.5 dark:hover:text-gray-400 hover:text-red-700" href="/datasets"><svg class="mr-1.5 text-gray-400 group-hover:text-red-500" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 25 25"><ellipse cx="12.5" cy="5" fill="currentColor" fill-opacity="0.25" rx="7.5" ry="2"></ellipse><path d="M12.5 15C16.6421 15 20 14.1046 20 13V20C20 21.1046 16.6421 22 12.5 22C8.35786 22 5 21.1046 5 20V13C5 14.1046 8.35786 15 12.5 15Z" fill="currentColor" opacity="0.5"></path><path d="M12.5 7C16.6421 7 20 6.10457 20 5V11.5C20 12.6046 16.6421 13.5 12.5 13.5C8.35786 13.5 5 12.6046 5 11.5V5C5 6.10457 8.35786 7 12.5 7Z" fill="currentColor" opacity="0.5"></path><path d="M5.23628 12C5.08204 12.1598 5 12.8273 5 13C5 14.1046 8.35786 15 12.5 15C16.6421 15 20 14.1046 20 13C20 12.8273 19.918 12.1598 19.7637 12C18.9311 12.8626 15.9947 13.5 12.5 13.5C9.0053 13.5 6.06886 12.8626 5.23628 12Z" fill="currentColor"></path></svg>
46
- Datasets</a>
47
- </li><li><a class="flex items-center group px-2 py-0.5 dark:hover:text-gray-400 hover:text-blue-700" href="/spaces"><svg class="mr-1.5 text-gray-400 group-hover:text-blue-500" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" viewBox="0 0 25 25"><path opacity=".5" d="M6.016 14.674v4.31h4.31v-4.31h-4.31ZM14.674 14.674v4.31h4.31v-4.31h-4.31ZM6.016 6.016v4.31h4.31v-4.31h-4.31Z" fill="currentColor"></path><path opacity=".75" fill-rule="evenodd" clip-rule="evenodd" d="M3 4.914C3 3.857 3.857 3 4.914 3h6.514c.884 0 1.628.6 1.848 1.414a5.171 5.171 0 0 1 7.31 7.31c.815.22 1.414.964 1.414 1.848v6.514A1.914 1.914 0 0 1 20.086 22H4.914A1.914 1.914 0 0 1 3 20.086V4.914Zm3.016 1.102v4.31h4.31v-4.31h-4.31Zm0 12.968v-4.31h4.31v4.31h-4.31Zm8.658 0v-4.31h4.31v4.31h-4.31Zm0-10.813a2.155 2.155 0 1 1 4.31 0 2.155 2.155 0 0 1-4.31 0Z" fill="currentColor"></path><path opacity=".25" d="M16.829 6.016a2.155 2.155 0 1 0 0 4.31 2.155 2.155 0 0 0 0-4.31Z" fill="currentColor"></path></svg>
48
- Spaces</a>
49
- </li><li><a class="flex items-center group px-2 py-0.5 dark:hover:text-gray-400 hover:text-yellow-700" href="/docs"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" class="mr-1.5 text-gray-400 group-hover:text-yellow-500" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path opacity="0.5" d="M20.9022 5.10334L10.8012 10.8791L7.76318 9.11193C8.07741 8.56791 8.5256 8.11332 9.06512 7.7914L15.9336 3.73907C17.0868 3.08811 18.5002 3.26422 19.6534 3.91519L19.3859 3.73911C19.9253 4.06087 20.5879 4.56025 20.9022 5.10334Z" fill="currentColor"></path><path d="M10.7999 10.8792V28.5483C10.2136 28.5475 9.63494 28.4139 9.10745 28.1578C8.5429 27.8312 8.074 27.3621 7.74761 26.7975C7.42122 26.2327 7.24878 25.5923 7.24756 24.9402V10.9908C7.25062 10.3319 7.42358 9.68487 7.74973 9.1123L10.7999 10.8792Z" fill="currentColor" fill-opacity="0.75"></path><path fill-rule="evenodd" clip-rule="evenodd" d="M21.3368 10.8499V6.918C21.3331 6.25959 21.16 5.61234 20.8346 5.03949L10.7971 10.8727L10.8046 10.874L21.3368 10.8499Z" fill="currentColor"></path><path opacity="0.5" d="M21.7937 10.8488L10.7825 10.8741V28.5486L21.7937 28.5234C23.3344 28.5234 24.5835 27.2743 24.5835 25.7335V13.6387C24.5835 12.0979 23.4365 11.1233 21.7937 10.8488Z" fill="currentColor"></path></svg>
50
- Docs</a>
51
- </li>
52
- <li><div class="relative ">
53
- <button class="px-2 py-0.5 group hover:text-green-700 dark:hover:text-gray-400 flex items-center " type="button">
54
- <svg class="mr-1.5 text-gray-400 group-hover:text-green-500" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 24 24"><path class="uim-tertiary" d="M19 6H5a3 3 0 0 0-3 3v2.72L8.837 14h6.326L22 11.72V9a3 3 0 0 0-3-3z" opacity=".5" fill="currentColor"></path><path class="uim-primary" d="M10 6V5h4v1h2V5a2.002 2.002 0 0 0-2-2h-4a2.002 2.002 0 0 0-2 2v1h2zm-1.163 8L2 11.72V18a3.003 3.003 0 0 0 3 3h14a3.003 3.003 0 0 0 3-3v-6.28L15.163 14H8.837z" fill="currentColor"></path></svg>
55
- Solutions
56
- </button>
57
-
58
-
59
-
60
- </div></li>
61
-
62
- <li><a class="flex items-center group px-2 py-0.5 hover:text-gray-500 dark:hover:text-gray-400" href="/pricing" data-ga-category="header-menu" data-ga-action="clicked pricing" data-ga-label="pricing">Pricing
63
- </a></li>
64
-
65
- <li><div class="relative group">
66
- <button class="px-2 py-0.5 hover:text-gray-500 dark:hover:text-gray-600 flex items-center " type="button">
67
- <svg class="mr-1.5 text-gray-500 w-5 group-hover:text-gray-400 dark:text-gray-300 dark:group-hover:text-gray-400" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" viewBox="0 0 32 18" preserveAspectRatio="xMidYMid meet"><path fill-rule="evenodd" clip-rule="evenodd" d="M14.4504 3.30221C14.4504 2.836 14.8284 2.45807 15.2946 2.45807H28.4933C28.9595 2.45807 29.3374 2.836 29.3374 3.30221C29.3374 3.76842 28.9595 4.14635 28.4933 4.14635H15.2946C14.8284 4.14635 14.4504 3.76842 14.4504 3.30221Z" fill="currentColor"></path><path fill-rule="evenodd" clip-rule="evenodd" d="M14.4504 9.00002C14.4504 8.53382 14.8284 8.15588 15.2946 8.15588H28.4933C28.9595 8.15588 29.3374 8.53382 29.3374 9.00002C29.3374 9.46623 28.9595 9.84417 28.4933 9.84417H15.2946C14.8284 9.84417 14.4504 9.46623 14.4504 9.00002Z" fill="currentColor"></path><path fill-rule="evenodd" clip-rule="evenodd" d="M14.4504 14.6978C14.4504 14.2316 14.8284 13.8537 15.2946 13.8537H28.4933C28.9595 13.8537 29.3374 14.2316 29.3374 14.6978C29.3374 15.164 28.9595 15.542 28.4933 15.542H15.2946C14.8284 15.542 14.4504 15.164 14.4504 14.6978Z" fill="currentColor"></path><path fill-rule="evenodd" clip-rule="evenodd" d="M1.94549 6.87377C2.27514 6.54411 2.80962 6.54411 3.13928 6.87377L6.23458 9.96907L9.32988 6.87377C9.65954 6.54411 10.194 6.54411 10.5237 6.87377C10.8533 7.20343 10.8533 7.73791 10.5237 8.06756L6.23458 12.3567L1.94549 8.06756C1.61583 7.73791 1.61583 7.20343 1.94549 6.87377Z" fill="currentColor"></path></svg>
68
-
69
- </button>
70
-
71
-
72
-
73
- </div></li>
74
- <li><hr class="w-0.5 h-5 border-none bg-gray-100 dark:bg-gray-800"></li>
75
- <form action="/logout" method="POST"><input type="hidden" name="csrf" value="eyJkYXRhIjp7ImV4cGlyYXRpb24iOjE2NzUxODA2MzQ5ODIsInVzZXJJZCI6IjYzOGY1MGU3ZjZkZTRiOWU3ZTE1ZTI4NSJ9LCJzaWduYXR1cmUiOiIxZTQxYTdiMDA3Yzg1NDgyZmEzMzY2NGM4ZjYyNDcxNDg0MGViZGM4NDVjNDUzZTRiNGJjZTc5MTBjYmYyYjU5In0="></form>
76
- <li><div class="relative ml-2 w-[1.38rem] h-[1.38rem]">
77
- <button class="ml-auto rounded-full ring-2 group ring-indigo-400 focus:ring-blue-500 hover:ring-offset-1 focus:ring-offset-1 focus:outline-none outline-none dark:ring-offset-gray-950 " type="button">
78
-
79
- <div class="relative"><img alt="" class="w-[1.38rem] h-[1.38rem] rounded-full overflow-hidden" src="/avatars/08cce7993292a724b1441d524fdc3767.svg">
80
- </div>
81
-
82
- </button>
83
-
84
-
85
-
86
- </div></li></ul></nav></div></header></div>
87
-
88
-
89
- <main class="flex flex-col flex-1 "><header class="bg-gradient-to-t from-gray-50-to-white via-white dark:via-gray-950 pt-10"><div class="container relative"><h1 class="flex items-center flex-wrap text-lg leading-tight mb-2 md:text-xl ">
90
- <div class="flex items-center mb-1 group"><div class="flex items-center mr-1.5 relative">
91
-
92
- <img alt="" class="w-3.5 h-3.5 rounded " src="https://aeiljuispo.cloudimg.io/v7/https://s3.amazonaws.com/moonup/production/uploads/1613114437487-60262a8e0703121c822a80b6.png?w=200&amp;h=200&amp;f=face"></div>
93
- <a href="/nvidia" class="font-sans text-gray-400 hover:text-blue-600">nvidia</a>
94
- <div class="text-gray-300 mx-0.5">/</div></div>
95
-
96
- <div class="max-w-full mb-1"><a class="font-mono font-semibold break-words" href="/nvidia/stt_en_citrinet_1024_ls">stt_en_citrinet_1024_ls</a>
97
- <div class="SVELTE_HYDRATER contents" data-props="{&quot;classNames&quot;:&quot;mr-4&quot;,&quot;title&quot;:&quot;Copy model name to clipboard&quot;,&quot;value&quot;:&quot;nvidia/stt_en_citrinet_1024_ls&quot;}" data-target="CopyButton"><button class="inline-flex items-center relative bg-white text-sm focus:text-green-500 cursor-pointer focus:outline-none mr-4 mx-0.5 text-gray-600 " title="Copy model name to clipboard" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg>
98
-
99
- <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div>
100
- Copied</div></button></div></div>
101
- <div class="SVELTE_HYDRATER contents" data-props="{&quot;isLoggedIn&quot;:true,&quot;classNames&quot;:&quot;mr-2 xl:mr-3 mb-1&quot;,&quot;isLikedByUser&quot;:false,&quot;likes&quot;:0,&quot;repoId&quot;:&quot;nvidia/stt_en_citrinet_1024_ls&quot;,&quot;repoType&quot;:&quot;model&quot;}" data-target="LikeButton"><div class="inline-flex items-center border leading-none whitespace-nowrap text-sm rounded-md text-gray-500 overflow-hidden bg-white mr-2 xl:mr-3 mb-1"><button class="relative flex items-center px-1.5 py-1 hover:bg-gradient-to-t focus:outline-none from-red-50 to-transparent dark:from-red-900 dark:to-red-800 overflow-hidden" title="Like"><svg class="mr-1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32" fill="currentColor"><path d="M22.45,6a5.47,5.47,0,0,1,3.91,1.64,5.7,5.7,0,0,1,0,8L16,26.13,5.64,15.64a5.7,5.7,0,0,1,0-8,5.48,5.48,0,0,1,7.82,0L16,10.24l2.53-2.58A5.44,5.44,0,0,1,22.45,6m0-2a7.47,7.47,0,0,0-5.34,2.24L16,7.36,14.89,6.24a7.49,7.49,0,0,0-10.68,0,7.72,7.72,0,0,0,0,10.82L16,29,27.79,17.06a7.72,7.72,0,0,0,0-10.82A7.49,7.49,0,0,0,22.45,4Z"></path></svg>
102
-
103
- <svg class="mr-1 absolute text-red-500 origin-center transform transition ease-in
104
- translate-y-10 scale-0" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32" fill="currentColor"><path d="M22.5,4c-2,0-3.9,0.8-5.3,2.2L16,7.4l-1.1-1.1C12,3.3,7.2,3.3,4.3,6.2c0,0-0.1,0.1-0.1,0.1c-3,3-3,7.8,0,10.8L16,29l11.8-11.9c3-3,3-7.8,0-10.8C26.4,4.8,24.5,4,22.5,4z"></path></svg>
105
- like
106
- </button>
107
- <button class="flex items-center px-1.5 py-1 border-l text-gray-400 focus:outline-none hover:bg-gray-50 dark:hover:bg-gray-900 dark:focus:bg-gray-800 focus:bg-gray-100 " title="See users who liked this repository">0</button></div>
108
- </div>
109
- </h1>
110
- <div class="SVELTE_HYDRATER contents" data-props="{&quot;tagObjs&quot;:[{&quot;id&quot;:&quot;automatic-speech-recognition&quot;,&quot;label&quot;:&quot;Automatic Speech Recognition&quot;,&quot;subType&quot;:&quot;audio&quot;,&quot;type&quot;:&quot;pipeline_tag&quot;},{&quot;id&quot;:&quot;nemo&quot;,&quot;label&quot;:&quot;NeMo&quot;,&quot;type&quot;:&quot;library&quot;},{&quot;id&quot;:&quot;pytorch&quot;,&quot;label&quot;:&quot;PyTorch&quot;,&quot;type&quot;:&quot;library&quot;},{&quot;id&quot;:&quot;dataset:librispeech_asr&quot;,&quot;label&quot;:&quot;librispeech_asr&quot;,&quot;type&quot;:&quot;dataset&quot;,&quot;disabled&quot;:false},{&quot;id&quot;:&quot;en&quot;,&quot;label&quot;:&quot;en&quot;,&quot;type&quot;:&quot;language&quot;},{&quot;id&quot;:&quot;arxiv:2104.01721&quot;,&quot;label&quot;:&quot;arxiv:2104.01721&quot;,&quot;type&quot;:&quot;arxiv&quot;},{&quot;id&quot;:&quot;speech&quot;,&quot;label&quot;:&quot;speech&quot;,&quot;type&quot;:&quot;other&quot;},{&quot;id&quot;:&quot;audio&quot;,&quot;label&quot;:&quot;audio&quot;,&quot;type&quot;:&quot;other&quot;},{&quot;id&quot;:&quot;CTC&quot;,&quot;label&quot;:&quot;CTC&quot;,&quot;type&quot;:&quot;other&quot;},{&quot;id&quot;:&quot;Citrinet&quot;,&quot;label&quot;:&quot;Citrinet&quot;,&quot;type&quot;:&quot;other&quot;},{&quot;id&quot;:&quot;Transformer&quot;,&quot;label&quot;:&quot;Transformer&quot;,&quot;type&quot;:&quot;other&quot;},{&quot;id&quot;:&quot;NeMo&quot;,&quot;label&quot;:&quot;NeMo&quot;,&quot;type&quot;:&quot;other&quot;},{&quot;id&quot;:&quot;hf-asr-leaderboard&quot;,&quot;label&quot;:&quot;hf-asr-leaderboard&quot;,&quot;type&quot;:&quot;other&quot;},{&quot;id&quot;:&quot;model-index&quot;,&quot;label&quot;:&quot;Eval Results&quot;,&quot;type&quot;:&quot;other&quot;},{&quot;id&quot;:&quot;license:cc-by-4.0&quot;,&quot;label&quot;:&quot;cc-by-4.0&quot;,&quot;type&quot;:&quot;license&quot;}]}" data-target="ModelHeaderTags"><div class="flex flex-wrap mb-3 md:mb-4"><a class="tag tag-white" href="/models?pipeline_tag=automatic-speech-recognition"><div class="tag-ico tag-ico-yellow"><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 18 18"><path fill-rule="evenodd" clip-rule="evenodd" d="M8.38893 3.42133C7.9778 3.14662 7.49446 3 7 3C6.33696 3 5.70108 3.26339 5.23223 3.73223C4.76339 4.20107 4.5 4.83696 4.5 5.5C4.5 5.99445 4.64662 6.4778 4.92133 6.88893C5.19603 7.30005 5.58648 7.62048 6.04329 7.8097C6.50011 7.99892 7.00278 8.04843 7.48773 7.95196C7.97268 7.8555 8.41814 7.6174 8.76777 7.26777C9.1174 6.91814 9.3555 6.47268 9.45197 5.98773C9.54843 5.50277 9.49892 5.00011 9.3097 4.54329C9.12048 4.08648 8.80005 3.69603 8.38893 3.42133ZM5.05551 2.58986C5.63108 2.20527 6.30777 2 7 2C7.92826 2 8.8185 2.36875 9.47488 3.02513C10.1313 3.6815 10.5 4.57174 10.5 5.5C10.5 6.19223 10.2947 6.86892 9.91015 7.4445C9.52556 8.02007 8.97894 8.46867 8.33939 8.73358C7.69985 8.99849 6.99612 9.0678 6.31719 8.93275C5.63825 8.7977 5.01461 8.46436 4.52513 7.97487C4.03564 7.48539 3.7023 6.86175 3.56725 6.18282C3.4322 5.50388 3.50152 4.80015 3.76642 4.16061C4.03133 3.52107 4.47993 2.97444 5.05551 2.58986ZM14.85 9.6425L15.7075 10.5C15.8005 10.5927 15.8743 10.7029 15.9245 10.8242C15.9747 10.9456 16.0004 11.0757 16 11.207V16H2V13.5C2.00106 12.5721 2.37015 11.6824 3.0263 11.0263C3.68244 10.3701 4.57207 10.0011 5.5 10H8.5C9.42793 10.0011 10.3176 10.3701 10.9737 11.0263C11.6299 11.6824 11.9989 12.5721 12 13.5V15H15V11.207L14.143 10.35C13.9426 10.4476 13.7229 10.4989 13.5 10.5C13.2033 10.5 12.9133 10.412 12.6666 10.2472C12.42 10.0824 12.2277 9.84811 12.1142 9.57403C12.0006 9.29994 11.9709 8.99834 12.0288 8.70737C12.0867 8.41639 12.2296 8.14912 12.4393 7.93934C12.6491 7.72956 12.9164 7.5867 13.2074 7.52882C13.4983 7.47094 13.7999 7.50065 14.074 7.61418C14.3481 7.72771 14.5824 7.91997 14.7472 8.16665C14.912 8.41332 15 8.70333 15 9C14.9988 9.22271 14.9475 9.44229 14.85 9.6425ZM3.73311 11.7331C3.26444 12.2018 3.00079 12.8372 3 13.5V15H11V13.5C10.9992 12.8372 10.7356 12.2018 10.2669 11.7331C9.79822 11.2644 9.1628 11.0008 8.5 11H5.5C4.8372 11.0008 4.20178 11.2644 3.73311 11.7331Z" fill="currentColor"></path></svg></div>
111
- <span>Automatic Speech Recognition</span>
112
- </a><a class="tag tag-white" href="/models?library=nemo"><svg class="text-black inline-block ml-2 text-sm" viewBox="0 0 16 16" xmlns="http://www.w3.org/2000/svg" width="1em" height="1em"><path d="m5.9698 5.86543v-.95636c.09285-.00661.18663-.01156.28219-.01456a5.75152 5.75152 0 0 1 4.33173 2.24749s-1.8534 2.57431-3.84063 2.57431a2.40975 2.40975 0 0 1 -.77329-.12364v-2.9c1.01832.123 1.223.57279 1.83539 1.59325l1.36157-1.148a3.60517 3.60517 0 0 0 -2.66934-1.30361 4.93745 4.93745 0 0 0 -.52762.03112m0-3.15922v1.42853c.09389-.00742.1879-.0134.28219-.0168 3.63754-.12254 6.0073 2.98317 6.0073 2.98317s-2.722 3.31-5.55774 3.31a4.18488 4.18488 0 0 1 -.73175-.06444v.883a4.81728 4.81728 0 0 0 .60938.03947c2.639 0 4.54736-1.34759 6.39542-2.94267.30618.24532 1.56062.8421 1.8186 1.1037-1.75722 1.47088-5.852 2.65644-8.17346 2.65644-.22369 0-.43886-.01352-.64994-.03376v1.241h10.0302v-10.58764zm0 6.88646v.754a4.26109 4.26109 0 0 1 -3.11821-2.97239 5.27645 5.27645 0 0 1 3.11821-1.50885v.8272l-.0038-.0004a2.34214 2.34214 0 0 0 -1.81935.83163 3.25091 3.25091 0 0 0 1.82315 2.06881m-4.33507-2.32834a6.045 6.045 0 0 1 4.33507-2.35526v-.77433c-3.19927.25677-5.9698 2.96637-5.9698 2.96637s1.56908 4.53638 5.9698 4.95171v-.82318c-3.22936-.4063-4.33507-3.96531-4.33507-3.96531z" fill="#76b900"></path></svg>
113
- <span>NeMo</span>
114
- </a><a class="tag tag-white" href="/models?library=pytorch"><svg class="text-black inline-block ml-2 text-sm" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><defs><clipPath id="a"><rect x="3.05" y="0.5" width="25.73" height="31" fill="none"></rect></clipPath></defs><g clip-path="url(#a)"><path d="M24.94,9.51a12.81,12.81,0,0,1,0,18.16,12.68,12.68,0,0,1-18,0,12.81,12.81,0,0,1,0-18.16l9-9V5l-.84.83-6,6a9.58,9.58,0,1,0,13.55,0ZM20.44,9a1.68,1.68,0,1,1,1.67-1.67A1.68,1.68,0,0,1,20.44,9Z" fill="#ee4c2c"></path></g></svg>
115
- <span>PyTorch</span>
116
- </a><div class="relative inline-block mr-1 mb-1 md:mr-1.5 md:mb-1.5">
117
- <button class=" " type="button">
118
-
119
- <a class="tag mr-0 mb-0 md:mr-0 md:mb-0 tag-indigo" href="/models?dataset=dataset:librispeech_asr"><svg class="flex-none ml-2 -mr-1 opacity-40" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 25 25"><ellipse cx="12.5" cy="5" fill="currentColor" fill-opacity="0.25" rx="7.5" ry="2"></ellipse><path d="M12.5 15C16.6421 15 20 14.1046 20 13V20C20 21.1046 16.6421 22 12.5 22C8.35786 22 5 21.1046 5 20V13C5 14.1046 8.35786 15 12.5 15Z" fill="currentColor" opacity="0.5"></path><path d="M12.5 7C16.6421 7 20 6.10457 20 5V11.5C20 12.6046 16.6421 13.5 12.5 13.5C8.35786 13.5 5 12.6046 5 11.5V5C5 6.10457 8.35786 7 12.5 7Z" fill="currentColor" opacity="0.5"></path><path d="M5.23628 12C5.08204 12.1598 5 12.8273 5 13C5 14.1046 8.35786 15 12.5 15C16.6421 15 20 14.1046 20 13C20 12.8273 19.918 12.1598 19.7637 12C18.9311 12.8626 15.9947 13.5 12.5 13.5C9.0053 13.5 6.06886 12.8626 5.23628 12Z" fill="currentColor"></path></svg>
120
- <span>librispeech_asr</span>
121
- </a>
122
-
123
-
124
- </button>
125
-
126
-
127
-
128
- </div><a class="tag tag-green" href="/models?language=en"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" class="ml-2 text-green-600/80" preserveAspectRatio="xMidYMid meet" width="1em" height="1em" viewBox="0 0 10 10"><path fill-rule="evenodd" clip-rule="evenodd" d="M0.625 5C0.625 6.16032 1.08594 7.27312 1.90641 8.09359C2.72688 8.91406 3.83968 9.375 5 9.375C6.16032 9.375 7.27312 8.91406 8.09359 8.09359C8.91406 7.27312 9.375 6.16032 9.375 5C9.375 3.83968 8.91406 2.72688 8.09359 1.90641C7.27312 1.08594 6.16032 0.625 5 0.625C3.83968 0.625 2.72688 1.08594 1.90641 1.90641C1.08594 2.72688 0.625 3.83968 0.625 5ZM7.64365 7.48027C7.61734 7.50832 7.59054 7.53598 7.56326 7.56326C7.13828 7.98824 6.61864 8.2968 6.0539 8.46842C6.29802 8.11949 6.49498 7.64804 6.63475 7.09483C7.00845 7.18834 7.35014 7.3187 7.64365 7.48027ZM8.10076 6.87776C8.37677 6.42196 8.55005 5.90894 8.60556 5.37499H6.86808C6.85542 5.71597 6.82551 6.04557 6.77971 6.35841C7.25309 6.47355 7.68808 6.6414 8.062 6.85549C8.07497 6.86283 8.08789 6.87025 8.10076 6.87776ZM6.03795 6.22536C6.07708 5.95737 6.1044 5.67232 6.11705 5.37499H3.88295C3.89666 5.69742 3.92764 6.00542 3.9722 6.29287C4.37075 6.21726 4.79213 6.17749 5.224 6.17749C5.50054 6.17749 5.77294 6.19376 6.03795 6.22536ZM4.1261 7.02673C4.34894 7.84835 4.68681 8.375 5 8.375C5.32122 8.375 5.66839 7.82101 5.8908 6.963C5.67389 6.93928 5.45082 6.92699 5.224 6.92699C4.84316 6.92699 4.47332 6.96176 4.1261 7.02673ZM3.39783 7.21853C3.53498 7.71842 3.72038 8.14579 3.9461 8.46842C3.42141 8.30898 2.93566 8.03132 2.52857 7.65192C2.77253 7.48017 3.06711 7.33382 3.39783 7.21853ZM3.23916 6.48077C3.18263 6.13193 3.14625 5.76074 3.13192 5.37499H1.39444C1.4585 5.99112 1.67936 6.57938 2.03393 7.08403C2.3706 6.83531 2.78055 6.63162 3.23916 6.48077ZM1.39444 4.62499H3.13192C3.14615 4.24204 3.18211 3.87344 3.23794 3.52681C2.77814 3.37545 2.36731 3.17096 2.03024 2.92123C1.67783 3.42469 1.45828 4.011 1.39444 4.62499ZM2.5237 2.35262C2.76812 2.52552 3.06373 2.67281 3.39584 2.78875C3.53318 2.28573 3.71928 1.85578 3.9461 1.53158C3.41932 1.69166 2.93178 1.97089 2.5237 2.35262ZM3.97101 3.71489C3.92709 4.00012 3.89654 4.30547 3.88295 4.62499H6.11705C6.10453 4.33057 6.07761 4.04818 6.03909 3.78248C5.77372 3.81417 5.50093 3.83049 5.224 3.83049C4.79169 3.83049 4.3699 3.79065 3.97101 3.71489ZM5.8928 3.04476C5.67527 3.06863 5.45151 3.08099 5.224 3.08099C4.84241 3.08099 4.47186 3.04609 4.12405 2.98086C4.34686 2.1549 4.68584 1.625 5 1.625C5.32218 1.625 5.67048 2.18233 5.8928 3.04476ZM6.78083 3.6493C6.826 3.95984 6.85552 4.28682 6.86808 4.62499H8.60556C8.55029 4.09337 8.37827 3.58251 8.10436 3.1282C8.0903 3.1364 8.07618 3.14449 8.062 3.15249C7.68838 3.36641 7.25378 3.53417 6.78083 3.6493ZM7.64858 2.52499C7.35446 2.68754 7.0117 2.81868 6.63664 2.91268C6.49676 2.35623 6.29913 1.88209 6.0539 1.53158C6.61864 1.7032 7.13828 2.01176 7.56326 2.43674C7.59224 2.46572 7.62068 2.49514 7.64858 2.52499Z" fill="currentColor"></path></svg>
129
- <span>English</span>
130
- </a><div class="relative inline-block mr-1 mb-1 md:mr-1.5 md:mb-1.5">
131
- <button class=" " type="button">
132
-
133
- <a class="tag mr-0 mb-0 md:mr-0 md:mb-0 tag-purple" href="/models?other=arxiv:2104.01721">
134
- <span>arxiv:2104.01721</span>
135
- </a>
136
-
137
-
138
- </button>
139
-
140
-
141
-
142
- </div><a class="tag tag-purple" href="/models?other=speech">
143
- <span>speech</span>
144
- </a><a class="tag tag-purple" href="/models?other=audio">
145
- <span>audio</span>
146
- </a><a class="tag tag-purple" href="/models?other=CTC">
147
- <span>CTC</span>
148
- </a><a class="tag tag-purple" href="/models?other=Citrinet">
149
- <span>Citrinet</span>
150
- </a><a class="tag tag-purple" href="/models?other=Transformer">
151
- <span>Transformer</span>
152
- </a><a class="tag tag-purple" href="/models?other=NeMo">
153
- <span>NeMo</span>
154
- </a><a class="tag tag-purple" href="/models?other=hf-asr-leaderboard">
155
- <span>hf-asr-leaderboard</span>
156
- </a><a class="tag tag-purple" href="/models?other=model-index"><svg class="ml-2 text-orange-400" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M30 30h-8V4h8z" fill="currentColor"></path><path d="M20 30h-8V12h8z" fill="currentColor"></path><path d="M10 30H2V18h8z" fill="currentColor"></path></svg>
157
- <span>Eval Results</span>
158
- </a><a class="tag tag-white rounded-full" href="/models?license=license:cc-by-4.0"><svg class="ml-2 text-xs text-gray-900" width="1em" height="1em" viewBox="0 0 10 10" fill="none" xmlns="http://www.w3.org/2000/svg"><path d="M1.46009 5.0945V6.88125C1.46009 7.25201 1.75937 7.55129 2.13012 7.55129C2.50087 7.55129 2.80016 7.25201 2.80016 6.88125V5.0945C2.80016 4.72375 2.50087 4.42446 2.13012 4.42446C1.75937 4.42446 1.46009 4.72375 1.46009 5.0945ZM4.14022 5.0945V6.88125C4.14022 7.25201 4.4395 7.55129 4.81026 7.55129C5.18101 7.55129 5.48029 7.25201 5.48029 6.88125V5.0945C5.48029 4.72375 5.18101 4.42446 4.81026 4.42446C4.4395 4.42446 4.14022 4.72375 4.14022 5.0945ZM1.23674 9.78473H8.38377C8.75452 9.78473 9.0538 9.48545 9.0538 9.1147C9.0538 8.74395 8.75452 8.44466 8.38377 8.44466H1.23674C0.865993 8.44466 0.566711 8.74395 0.566711 9.1147C0.566711 9.48545 0.865993 9.78473 1.23674 9.78473ZM6.82036 5.0945V6.88125C6.82036 7.25201 7.11964 7.55129 7.49039 7.55129C7.86114 7.55129 8.16042 7.25201 8.16042 6.88125V5.0945C8.16042 4.72375 7.86114 4.42446 7.49039 4.42446C7.11964 4.42446 6.82036 4.72375 6.82036 5.0945ZM4.39484 0.623142L0.865993 2.48137C0.682851 2.57517 0.566711 2.76725 0.566711 2.97273C0.566711 3.28094 0.816857 3.53109 1.12507 3.53109H8.49991C8.80365 3.53109 9.0538 3.28094 9.0538 2.97273C9.0538 2.76725 8.93766 2.57517 8.75452 2.48137L5.22568 0.623142C4.9666 0.484669 4.65391 0.484669 4.39484 0.623142V0.623142Z" fill="currentColor"></path></svg>
159
- <span class="text-gray-400 !pr-0 -mr-1">License: </span>
160
- <span>cc-by-4.0</span>
161
- </a></div></div>
162
- <div class="border-b border-gray-100"><div class="flex flex-col-reverse lg:flex-row lg:items-center lg:justify-between"><div class="flex items-center h-12 -mb-px overflow-x-auto overflow-y-hidden"><a class="tab-alternate " href="/nvidia/stt_en_citrinet_1024_ls"><svg class="mr-1.5 text-gray-400" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 24 24"><path class="uim-quaternary" d="M20.23 7.24L12 12L3.77 7.24a1.98 1.98 0 0 1 .7-.71L11 2.76c.62-.35 1.38-.35 2 0l6.53 3.77c.29.173.531.418.7.71z" opacity=".25" fill="currentColor"></path><path class="uim-tertiary" d="M12 12v9.5a2.09 2.09 0 0 1-.91-.21L4.5 17.48a2.003 2.003 0 0 1-1-1.73v-7.5a2.06 2.06 0 0 1 .27-1.01L12 12z" opacity=".5" fill="currentColor"></path><path class="uim-primary" d="M20.5 8.25v7.5a2.003 2.003 0 0 1-1 1.73l-6.62 3.82c-.275.13-.576.198-.88.2V12l8.23-4.76c.175.308.268.656.27 1.01z" fill="currentColor"></path></svg>
163
- Model card
164
-
165
-
166
- </a><a class="tab-alternate active" href="/nvidia/stt_en_citrinet_1024_ls/tree/main"><svg class="mr-1.5 text-gray-400" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 24 24"><path class="uim-tertiary" d="M21 19h-8a1 1 0 0 1 0-2h8a1 1 0 0 1 0 2zm0-4h-8a1 1 0 0 1 0-2h8a1 1 0 0 1 0 2zm0-8h-8a1 1 0 0 1 0-2h8a1 1 0 0 1 0 2zm0 4h-8a1 1 0 0 1 0-2h8a1 1 0 0 1 0 2z" opacity=".5" fill="currentColor"></path><path class="uim-primary" d="M9 19a1 1 0 0 1-1-1V6a1 1 0 0 1 2 0v12a1 1 0 0 1-1 1zm-6-4.333a1 1 0 0 1-.64-1.769L3.438 12l-1.078-.898a1 1 0 0 1 1.28-1.538l2 1.667a1 1 0 0 1 0 1.538l-2 1.667a.999.999 0 0 1-.64.231z" fill="currentColor"></path></svg>
167
- <span class="xl:hidden">Files</span>
168
- <span class="hidden xl:inline">Files and versions</span>
169
-
170
-
171
- </a><a class="tab-alternate " href="/nvidia/stt_en_citrinet_1024_ls/discussions"><svg class="mr-1.5 text-gray-400" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M20.6081 3C21.7684 3 22.8053 3.49196 23.5284 4.38415C23.9756 4.93678 24.4428 5.82749 24.4808 7.16133C24.9674 7.01707 25.4353 6.93643 25.8725 6.93643C26.9833 6.93643 27.9865 7.37587 28.696 8.17411C29.6075 9.19872 30.0124 10.4579 29.8361 11.7177C29.7523 12.3177 29.5581 12.8555 29.2678 13.3534C29.8798 13.8646 30.3306 14.5763 30.5485 15.4322C30.719 16.1032 30.8939 17.5006 29.9808 18.9403C30.0389 19.0342 30.0934 19.1319 30.1442 19.2318C30.6932 20.3074 30.7283 21.5229 30.2439 22.6548C29.5093 24.3704 27.6841 25.7219 24.1397 27.1727C21.9347 28.0753 19.9174 28.6523 19.8994 28.6575C16.9842 29.4379 14.3477 29.8345 12.0653 29.8345C7.87017 29.8345 4.8668 28.508 3.13831 25.8921C0.356375 21.6797 0.754104 17.8269 4.35369 14.1131C6.34591 12.058 7.67023 9.02782 7.94613 8.36275C8.50224 6.39343 9.97271 4.20438 12.4172 4.20438H12.4179C12.6236 4.20438 12.8314 4.2214 13.0364 4.25468C14.107 4.42854 15.0428 5.06476 15.7115 6.02205C16.4331 5.09583 17.134 4.359 17.7682 3.94323C18.7242 3.31737 19.6794 3 20.6081 3ZM20.6081 5.95917C20.2427 5.95917 19.7963 6.1197 19.3039 6.44225C17.7754 7.44319 14.8258 12.6772 13.7458 14.7131C13.3839 15.3952 12.7655 15.6837 12.2086 15.6837C11.1036 15.6837 10.2408 14.5497 12.1076 13.1085C14.9146 10.9402 13.9299 7.39584 12.5898 7.1776C12.5311 7.16799 12.4731 7.16355 12.4172 7.16355C11.1989 7.16355 10.6615 9.33114 10.6615 9.33114C10.6615 9.33114 9.0863 13.4148 6.38031 16.206C3.67434 18.998 3.5346 21.2388 5.50675 24.2246C6.85185 26.2606 9.42666 26.8753 12.0653 26.8753C14.8021 26.8753 17.6077 26.2139 19.1799 25.793C19.2574 25.7723 28.8193 22.984 27.6081 20.6107C27.4046 20.212 27.0693 20.0522 26.6471 20.0522C24.9416 20.0522 21.8393 22.6726 20.5057 22.6726C20.2076 22.6726 19.9976 22.5416 19.9116 22.222C19.3433 20.1173 28.552 19.2325 27.7758 16.1839C27.639 15.6445 27.2677 15.4256 26.746 15.4263C24.4923 15.4263 19.4358 19.5181 18.3759 19.5181C18.2949 19.5181 18.2368 19.4937 18.2053 19.4419C17.6743 18.557 17.9653 17.9394 21.7082 15.6009C25.4511 13.2617 28.0783 11.8545 26.5841 10.1752C26.4121 9.98141 26.1684 9.8956 25.8725 9.8956C23.6001 9.89634 18.2311 14.9403 18.2311 14.9403C18.2311 14.9403 16.7821 16.496 15.9057 16.496C15.7043 16.496 15.533 16.4139 15.4169 16.2112C14.7956 15.1296 21.1879 10.1286 21.5484 8.06535C21.7928 6.66715 21.3771 5.95917 20.6081 5.95917Z" fill="#FF9D00"></path><path d="M5.50686 24.2246C3.53472 21.2387 3.67446 18.9979 6.38043 16.206C9.08641 13.4147 10.6615 9.33111 10.6615 9.33111C10.6615 9.33111 11.2499 6.95933 12.59 7.17757C13.93 7.39581 14.9139 10.9401 12.1069 13.1084C9.29997 15.276 12.6659 16.7489 13.7459 14.713C14.8258 12.6772 17.7747 7.44316 19.304 6.44221C20.8326 5.44128 21.9089 6.00204 21.5484 8.06532C21.188 10.1286 14.795 15.1295 15.4171 16.2118C16.0391 17.2934 18.2312 14.9402 18.2312 14.9402C18.2312 14.9402 25.0907 8.49588 26.5842 10.1752C28.0776 11.8545 25.4512 13.2616 21.7082 15.6008C17.9646 17.9393 17.6744 18.557 18.2054 19.4418C18.7372 20.3266 26.9998 13.1351 27.7759 16.1838C28.5513 19.2324 19.3434 20.1173 19.9117 22.2219C20.48 24.3274 26.3979 18.2382 27.6082 20.6107C28.8193 22.9839 19.2574 25.7722 19.18 25.7929C16.0914 26.62 8.24723 28.3726 5.50686 24.2246Z" fill="#FFD21E"></path></svg>
172
- Community
173
- <div class="h-4 min-w-[1rem] px-1 rounded bg-black text-xs text-white shadow-sm items-center justify-center leading-none flex ml-1.5">2
174
- </div>
175
-
176
- </a>
177
- </div>
178
-
179
- <div class="SVELTE_HYDRATER contents" data-props="{&quot;authLight&quot;:{&quot;isHf&quot;:false,&quot;u&quot;:{&quot;accessTokens&quot;:[{&quot;_id&quot;:&quot;63b85fcfaaa8cf17f095c774&quot;,&quot;token&quot;:&quot;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&quot;,&quot;displayName&quot;:&quot;Token_Classification_Disfluency&quot;,&quot;role&quot;:&quot;write&quot;,&quot;createdAt&quot;:&quot;2023-01-06T17:52:15.584Z&quot;},{&quot;_id&quot;:&quot;63c7a538bce33b25442a230f&quot;,&quot;token&quot;:&quot;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&quot;,&quot;displayName&quot;:&quot;Inferencing Model&quot;,&quot;role&quot;:&quot;read&quot;,&quot;createdAt&quot;:&quot;2023-01-18T07:52:24.848Z&quot;},{&quot;_id&quot;:&quot;63d3f2bfdf01ef426a05a1d3&quot;,&quot;token&quot;:&quot;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&quot;,&quot;displayName&quot;:&quot;HackFest&quot;,&quot;role&quot;:&quot;read&quot;,&quot;createdAt&quot;:&quot;2023-01-27T15:50:23.057Z&quot;}],&quot;isPro&quot;:false,&quot;orgs&quot;:[],&quot;user&quot;:&quot;Aditya02&quot;}},&quot;model&quot;:{&quot;author&quot;:&quot;nvidia&quot;,&quot;cardData&quot;:{&quot;language&quot;:[&quot;en&quot;],&quot;library_name&quot;:&quot;nemo&quot;,&quot;datasets&quot;:[&quot;librispeech_asr&quot;],&quot;thumbnail&quot;:null,&quot;tags&quot;:[&quot;automatic-speech-recognition&quot;,&quot;speech&quot;,&quot;audio&quot;,&quot;CTC&quot;,&quot;Citrinet&quot;,&quot;Transformer&quot;,&quot;pytorch&quot;,&quot;NeMo&quot;,&quot;hf-asr-leaderboard&quot;],&quot;license&quot;:&quot;cc-by-4.0&quot;,&quot;widget&quot;:[{&quot;example_title&quot;:&quot;Librispeech sample 1&quot;,&quot;src&quot;:&quot;https://cdn-media.huggingface.co/speech_samples/sample1.flac&quot;},{&quot;example_title&quot;:&quot;Librispeech sample 2&quot;,&quot;src&quot;:&quot;https://cdn-media.huggingface.co/speech_samples/sample2.flac&quot;}],&quot;model-index&quot;:[{&quot;name&quot;:&quot;stt_en_citrinet_1024_ls&quot;,&quot;results&quot;:[{&quot;task&quot;:{&quot;name&quot;:&quot;Automatic Speech Recognition&quot;,&quot;type&quot;:&quot;automatic-speech-recognition&quot;},&quot;dataset&quot;:{&quot;name&quot;:&quot;LibriSpeech (clean)&quot;,&quot;type&quot;:&quot;librispeech_asr&quot;,&quot;config&quot;:&quot;clean&quot;,&quot;split&quot;:&quot;test&quot;,&quot;args&quot;:{&quot;language&quot;:&quot;en&quot;}},&quot;metrics&quot;:[{&quot;name&quot;:&quot;Test WER&quot;,&quot;type&quot;:&quot;wer&quot;,&quot;value&quot;:2.5,&quot;verified&quot;:false}]},{&quot;task&quot;:{&quot;type&quot;:&quot;Automatic Speech Recognition&quot;,&quot;name&quot;:&quot;automatic-speech-recognition&quot;},&quot;dataset&quot;:{&quot;name&quot;:&quot;LibriSpeech (other)&quot;,&quot;type&quot;:&quot;librispeech_asr&quot;,&quot;config&quot;:&quot;other&quot;,&quot;split&quot;:&quot;test&quot;,&quot;args&quot;:{&quot;language&quot;:&quot;en&quot;}},&quot;metrics&quot;:[{&quot;name&quot;:&quot;Test WER&quot;,&quot;type&quot;:&quot;wer&quot;,&quot;value&quot;:6.3,&quot;verified&quot;:false}]}]}]},&quot;cardExists&quot;:true,&quot;discussionsDisabled&quot;:false,&quot;id&quot;:&quot;nvidia/stt_en_citrinet_1024_ls&quot;,&quot;isLikedByUser&quot;:false,&quot;inference&quot;:true,&quot;lastModified&quot;:&quot;2022-07-15T21:33:44.000Z&quot;,&quot;likes&quot;:0,&quot;pipeline_tag&quot;:&quot;automatic-speech-recognition&quot;,&quot;library_name&quot;:&quot;nemo&quot;,&quot;model-index&quot;:[{&quot;name&quot;:&quot;stt_en_citrinet_1024_ls&quot;,&quot;results&quot;:[{&quot;task&quot;:{&quot;name&quot;:&quot;Automatic Speech Recognition&quot;,&quot;type&quot;:&quot;automatic-speech-recognition&quot;},&quot;dataset&quot;:{&quot;name&quot;:&quot;LibriSpeech (clean)&quot;,&quot;type&quot;:&quot;librispeech_asr&quot;,&quot;config&quot;:&quot;clean&quot;,&quot;split&quot;:&quot;test&quot;,&quot;args&quot;:{&quot;language&quot;:&quot;en&quot;}},&quot;metrics&quot;:[{&quot;name&quot;:&quot;Test WER&quot;,&quot;type&quot;:&quot;wer&quot;,&quot;value&quot;:2.5,&quot;verified&quot;:false}]},{&quot;task&quot;:{&quot;type&quot;:&quot;Automatic Speech Recognition&quot;,&quot;name&quot;:&quot;automatic-speech-recognition&quot;},&quot;dataset&quot;:{&quot;name&quot;:&quot;LibriSpeech (other)&quot;,&quot;type&quot;:&quot;librispeech_asr&quot;,&quot;config&quot;:&quot;other&quot;,&quot;split&quot;:&quot;test&quot;,&quot;args&quot;:{&quot;language&quot;:&quot;en&quot;}},&quot;metrics&quot;:[{&quot;name&quot;:&quot;Test WER&quot;,&quot;type&quot;:&quot;wer&quot;,&quot;value&quot;:6.3,&quot;verified&quot;:false}]}]}],&quot;private&quot;:false,&quot;gated&quot;:false,&quot;pwcLink&quot;:{&quot;url&quot;:&quot;https://paperswithcode.com/sota?task=Automatic+Speech+Recognition&amp;dataset=LibriSpeech+%28clean%29&quot;},&quot;tags&quot;:[&quot;en&quot;,&quot;dataset:librispeech_asr&quot;,&quot;arxiv:2104.01721&quot;,&quot;nemo&quot;,&quot;automatic-speech-recognition&quot;,&quot;speech&quot;,&quot;audio&quot;,&quot;CTC&quot;,&quot;Citrinet&quot;,&quot;Transformer&quot;,&quot;pytorch&quot;,&quot;NeMo&quot;,&quot;hf-asr-leaderboard&quot;,&quot;license:cc-by-4.0&quot;,&quot;model-index&quot;],&quot;tag_objs&quot;:[{&quot;id&quot;:&quot;automatic-speech-recognition&quot;,&quot;label&quot;:&quot;Automatic Speech Recognition&quot;,&quot;subType&quot;:&quot;audio&quot;,&quot;type&quot;:&quot;pipeline_tag&quot;},{&quot;id&quot;:&quot;nemo&quot;,&quot;label&quot;:&quot;NeMo&quot;,&quot;type&quot;:&quot;library&quot;},{&quot;id&quot;:&quot;pytorch&quot;,&quot;label&quot;:&quot;PyTorch&quot;,&quot;type&quot;:&quot;library&quot;},{&quot;id&quot;:&quot;dataset:librispeech_asr&quot;,&quot;label&quot;:&quot;librispeech_asr&quot;,&quot;type&quot;:&quot;dataset&quot;,&quot;disabled&quot;:false},{&quot;id&quot;:&quot;en&quot;,&quot;label&quot;:&quot;en&quot;,&quot;type&quot;:&quot;language&quot;},{&quot;id&quot;:&quot;arxiv:2104.01721&quot;,&quot;label&quot;:&quot;arxiv:2104.01721&quot;,&quot;type&quot;:&quot;arxiv&quot;},{&quot;id&quot;:&quot;speech&quot;,&quot;label&quot;:&quot;speech&quot;,&quot;type&quot;:&quot;other&quot;},{&quot;id&quot;:&quot;audio&quot;,&quot;label&quot;:&quot;audio&quot;,&quot;type&quot;:&quot;other&quot;},{&quot;id&quot;:&quot;CTC&quot;,&quot;label&quot;:&quot;CTC&quot;,&quot;type&quot;:&quot;other&quot;},{&quot;id&quot;:&quot;Citrinet&quot;,&quot;label&quot;:&quot;Citrinet&quot;,&quot;type&quot;:&quot;other&quot;},{&quot;id&quot;:&quot;Transformer&quot;,&quot;label&quot;:&quot;Transformer&quot;,&quot;type&quot;:&quot;other&quot;},{&quot;id&quot;:&quot;NeMo&quot;,&quot;label&quot;:&quot;NeMo&quot;,&quot;type&quot;:&quot;other&quot;},{&quot;id&quot;:&quot;hf-asr-leaderboard&quot;,&quot;label&quot;:&quot;hf-asr-leaderboard&quot;,&quot;type&quot;:&quot;other&quot;},{&quot;id&quot;:&quot;model-index&quot;,&quot;label&quot;:&quot;Eval Results&quot;,&quot;type&quot;:&quot;other&quot;},{&quot;id&quot;:&quot;license:cc-by-4.0&quot;,&quot;label&quot;:&quot;cc-by-4.0&quot;,&quot;type&quot;:&quot;license&quot;}],&quot;hasHandlerPy&quot;:false,&quot;widgetData&quot;:[{&quot;example_title&quot;:&quot;Librispeech sample 1&quot;,&quot;src&quot;:&quot;https://cdn-media.huggingface.co/speech_samples/sample1.flac&quot;},{&quot;example_title&quot;:&quot;Librispeech sample 2&quot;,&quot;src&quot;:&quot;https://cdn-media.huggingface.co/speech_samples/sample2.flac&quot;}]},&quot;canWrite&quot;:false,&quot;csrf&quot;:&quot;eyJkYXRhIjp7ImV4cGlyYXRpb24iOjE2NzUxODA2MzQ5ODIsInVzZXJJZCI6IjYzOGY1MGU3ZjZkZTRiOWU3ZTE1ZTI4NSJ9LCJzaWduYXR1cmUiOiIxZTQxYTdiMDA3Yzg1NDgyZmEzMzY2NGM4ZjYyNDcxNDg0MGViZGM4NDVjNDUzZTRiNGJjZTc5MTBjYmYyYjU5In0=&quot;}" data-target="ModelHeaderActions">
180
-
181
-
182
- <div class="relative mb-1.5 flex flex-wrap sm:flex-nowrap lg:mb-0 gap-1.5"><div class="order-last sm:order-first"><div class="relative ">
183
- <button class="btn px-1.5 py-1.5 " type="button">
184
-
185
- <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" class="p-0.5" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><circle cx="16" cy="7" r="3" fill="currentColor"></circle><circle cx="16" cy="16" r="3" fill="currentColor"></circle><circle cx="16" cy="25" r="3" fill="currentColor"></circle></svg>
186
-
187
- </button>
188
-
189
-
190
-
191
- </div>
192
-
193
-
194
-
195
- </div>
196
-
197
-
198
- <div class="flex-none w-full sm:w-auto"><div class="relative ">
199
- <button class="text-sm btn cursor-pointer w-full btn text-sm" type="button">
200
- <svg class="mr-1.5 " xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><rect x="6.34" y="19" width="11.31" height="2" transform="translate(-10.63 14.34) rotate(-45)"></rect><path d="M17,30a1,1,0,0,1-.37-.07,1,1,0,0,1-.62-.79l-1-7,2-.28.75,5.27L21,24.52V17a1,1,0,0,1,.29-.71l4.07-4.07A8.94,8.94,0,0,0,28,5.86V4H26.14a8.94,8.94,0,0,0-6.36,2.64l-4.07,4.07A1,1,0,0,1,15,11H7.48L4.87,14.26l5.27.75-.28,2-7-1a1,1,0,0,1-.79-.62,1,1,0,0,1,.15-1l4-5A1,1,0,0,1,7,9h7.59l3.77-3.78A10.92,10.92,0,0,1,26.14,2H28a2,2,0,0,1,2,2V5.86a10.92,10.92,0,0,1-3.22,7.78L23,17.41V25a1,1,0,0,1-.38.78l-5,4A1,1,0,0,1,17,30Z"></path></svg>
201
- Deploy
202
- <svg class="-mr-1 text-gray-500" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 24 24" style="transform: rotate(360deg);"><path d="M7 10l5 5l5-5z" fill="currentColor"></path></svg></button>
203
-
204
-
205
-
206
- </div>
207
- </div>
208
- <div class="flex-auto sm:flex-none"><button class="cursor-pointer w-full btn text-sm" type="button" ><svg class="mr-1.5 " xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32" style="transform: rotate(360deg);"><path d="M31 16l-7 7l-1.41-1.41L28.17 16l-5.58-5.59L24 9l7 7z" fill="currentColor"></path><path d="M1 16l7-7l1.41 1.41L3.83 16l5.58 5.59L8 23l-7-7z" fill="currentColor"></path><path d="M12.419 25.484L17.639 6l1.932.518L14.35 26z" fill="currentColor"></path></svg>
209
- Use in NeMo</button>
210
- </div></div></div>
211
- </div></div></div></header>
212
-
213
- <div class="container relative flex flex-col md:grid md:space-y-0 w-full md:grid-cols-12 space-y-4 md:gap-6 mb-16"><section class="pt-8 border-gray-100 col-span-full"><header class="pb-2 flex items-center flex-wrap lg:flex-nowrap justify-start md:justify-end"><div class="flex flex-wrap items-center md:flex-grow mr-4 lg:flex-nowrap min-w-0 basis-auto md:basis-full lg:basis-auto"><div class="SVELTE_HYDRATER contents" data-props="{&quot;path&quot;:&quot;README.md&quot;,&quot;repoName&quot;:&quot;nvidia/stt_en_citrinet_1024_ls&quot;,&quot;repoType&quot;:&quot;model&quot;,&quot;rev&quot;:&quot;main&quot;,&quot;refs&quot;:{&quot;branches&quot;:[{&quot;name&quot;:&quot;main&quot;,&quot;ref&quot;:&quot;refs/heads/main&quot;,&quot;targetCommit&quot;:&quot;2ed90e3b83bd85d3a7adebb1d91816926fe33fc9&quot;}],&quot;tags&quot;:[],&quot;converts&quot;:[]},&quot;view&quot;:&quot;blob&quot;}" data-target="BranchSelector"><div class="relative mr-4 mb-2">
214
- <button class="text-sm md:text-base cursor-pointer w-full btn text-sm" type="button">
215
- <svg class="mr-1.5 text-gray-700 dark:text-gray-400" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 24 24" style="transform: rotate(360deg);"><path d="M13 14c-3.36 0-4.46 1.35-4.82 2.24C9.25 16.7 10 17.76 10 19a3 3 0 0 1-3 3a3 3 0 0 1-3-3c0-1.31.83-2.42 2-2.83V7.83A2.99 2.99 0 0 1 4 5a3 3 0 0 1 3-3a3 3 0 0 1 3 3c0 1.31-.83 2.42-2 2.83v5.29c.88-.65 2.16-1.12 4-1.12c2.67 0 3.56-1.34 3.85-2.23A3.006 3.006 0 0 1 14 7a3 3 0 0 1 3-3a3 3 0 0 1 3 3c0 1.34-.88 2.5-2.09 2.86C17.65 11.29 16.68 14 13 14m-6 4a1 1 0 0 0-1 1a1 1 0 0 0 1 1a1 1 0 0 0 1-1a1 1 0 0 0-1-1M7 4a1 1 0 0 0-1 1a1 1 0 0 0 1 1a1 1 0 0 0 1-1a1 1 0 0 0-1-1m10 2a1 1 0 0 0-1 1a1 1 0 0 0 1 1a1 1 0 0 0 1-1a1 1 0 0 0-1-1z" fill="currentColor"></path></svg>
216
- main
217
- <svg class="-mr-1 text-gray-500" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 24 24" style="transform: rotate(360deg);"><path d="M7 10l5 5l5-5z" fill="currentColor"></path></svg></button>
218
-
219
-
220
-
221
- </div></div>
222
- <div class="flex items-center overflow-hidden mb-2"><a class="hover:underline text-gray-800 truncate" href="/nvidia/stt_en_citrinet_1024_ls/tree/main">stt_en_citrinet_1024_ls</a>
223
- <span class="text-gray-300 mx-1">/</span>
224
- <span class="dark:text-gray-300">README.md</span></div></div>
225
-
226
-
227
- </header>
228
- <div class="SVELTE_HYDRATER contents" data-props="{&quot;commitLast&quot;:{&quot;date&quot;:&quot;2022-07-15T21:33:44.000Z&quot;,&quot;subject&quot;:&quot;Update README.md&quot;,&quot;authors&quot;:[{&quot;_id&quot;:&quot;6254f8e5d21e4cc386b881ad&quot;,&quot;avatar&quot;:&quot;https://aeiljuispo.cloudimg.io/v7/https://s3.amazonaws.com/moonup/production/uploads/1649899774659-6254f8e5d21e4cc386b881ad.jpeg?w=200&amp;h=200&amp;f=face&quot;,&quot;isHf&quot;:false,&quot;user&quot;:&quot;smajumdar94&quot;}],&quot;commit&quot;:{&quot;id&quot;:&quot;2ed90e3b83bd85d3a7adebb1d91816926fe33fc9&quot;,&quot;parentIds&quot;:[&quot;6c41f3c2f923b340e726abf6ce4223ed98c6257e&quot;]},&quot;title&quot;:&quot;Update README.md&quot;},&quot;repo&quot;:{&quot;name&quot;:&quot;nvidia/stt_en_citrinet_1024_ls&quot;,&quot;type&quot;:&quot;model&quot;}}" data-target="LastCommit"><div class="border border-b-0 dark:border-gray-800 px-3 py-2 flex items-baseline rounded-t-lg bg-gradient-to-t from-gray-100-to-white"><img class="w-4 h-4 rounded-full mt-0.5 mr-2.5 self-center" alt="smajumdar94's picture" src="https://aeiljuispo.cloudimg.io/v7/https://s3.amazonaws.com/moonup/production/uploads/1649899774659-6254f8e5d21e4cc386b881ad.jpeg?w=200&amp;h=200&amp;f=face">
229
- <div class="mr-5 truncate flex items-center flex-none"><a class="hover:underline" href="/smajumdar94">smajumdar94
230
- </a>
231
-
232
- </div>
233
- <div class="mr-4 font-mono text-sm text-gray-500 truncate hover:prose-a:underline"><!-- HTML_TAG_START -->Update README.md<!-- HTML_TAG_END --></div>
234
- <a class="text-sm border dark:border-gray-800 px-1.5 rounded bg-gray-50 dark:bg-gray-900 hover:underline" href="/nvidia/stt_en_citrinet_1024_ls/commit/2ed90e3b83bd85d3a7adebb1d91816926fe33fc9">2ed90e3</a>
235
-
236
- <time class="ml-auto hidden lg:block text-gray-500 dark:text-gray-400 truncate flex-none pl-2" datetime="2022-07-15T21:33:44" title="Fri, 15 Jul 2022 21:33:44 GMT">7 months ago</time></div></div>
237
- <div class="flex flex-wrap items-center px-3 py-1.5 border dark:border-gray-800 text-sm text-gray-800 dark:bg-gray-900"><div class="flex items-center gap-3 text-sm font-medium"><a class="capitalize rounded-md px-1.5 bg-gray-200 dark:bg-gray-800" href="/nvidia/stt_en_citrinet_1024_ls/blob/main/README.md">preview</a>
238
- <a class="capitalize rounded-md px-1.5 " href="/nvidia/stt_en_citrinet_1024_ls/blob/main/README.md?code=true">code</a></div>
239
- <div class="mx-4 text-gray-200">|</div>
240
- <a class="flex items-center hover:underline my-1 mr-4 " href="/nvidia/stt_en_citrinet_1024_ls/raw/main/README.md"><svg class="mr-1.5" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32" style="transform: rotate(360deg);"><path d="M31 16l-7 7l-1.41-1.41L28.17 16l-5.58-5.59L24 9l7 7z" fill="currentColor"></path><path d="M1 16l7-7l1.41 1.41L3.83 16l5.58 5.59L8 23l-7-7z" fill="currentColor"></path><path d="M12.419 25.484L17.639 6l1.932.518L14.35 26z" fill="currentColor"></path></svg>
241
- raw
242
- </a><a class="flex items-center hover:underline my-1 mr-4 " href="/nvidia/stt_en_citrinet_1024_ls/commits/main/README.md"><svg class="mr-1.5" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32" style="transform: rotate(360deg);"><path d="M16 4C9.383 4 4 9.383 4 16s5.383 12 12 12s12-5.383 12-12S22.617 4 16 4zm0 2c5.535 0 10 4.465 10 10s-4.465 10-10 10S6 21.535 6 16S10.465 6 16 6zm-1 2v9h7v-2h-5V8z" fill="currentColor"></path></svg>
243
- history
244
- </a><a class="flex items-center hover:underline my-1 mr-4 " href="/nvidia/stt_en_citrinet_1024_ls/blame/main/README.md"><svg class="mr-1.5" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32" style="transform: rotate(360deg);"><path d="M16 2a14 14 0 1 0 14 14A14 14 0 0 0 16 2zm0 26a12 12 0 1 1 12-12a12 12 0 0 1-12 12z" fill="currentColor"></path><path d="M11.5 11a2.5 2.5 0 1 0 2.5 2.5a2.48 2.48 0 0 0-2.5-2.5z" fill="currentColor"></path><path d="M20.5 11a2.5 2.5 0 1 0 2.5 2.5a2.48 2.48 0 0 0-2.5-2.5z" fill="currentColor"></path></svg>
245
- blame
246
- </a><a class="flex items-center hover:underline my-1 mr-4 text-green-600 dark:text-gray-300" href="/nvidia/stt_en_citrinet_1024_ls/edit/main/README.md"><svg class="mr-1.5" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M2 26h28v2H2z" fill="currentColor"></path><path d="M25.4 9c.8-.8.8-2 0-2.8l-3.6-3.6c-.8-.8-2-.8-2.8 0l-15 15V24h6.4l15-15zm-5-5L24 7.6l-3 3L17.4 7l3-3zM6 22v-3.6l10-10l3.6 3.6l-10 10H6z" fill="currentColor"></path></svg>
247
- contribute
248
- </a><a class="flex items-center hover:underline my-1 mr-4 " href="/nvidia/stt_en_citrinet_1024_ls/delete/main/README.md"><svg class="mr-1.5" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M12 12h2v12h-2z" fill="currentColor"></path><path d="M18 12h2v12h-2z" fill="currentColor"></path><path d="M4 6v2h2v20a2 2 0 0 0 2 2h16a2 2 0 0 0 2-2V8h2V6zm4 22V8h16v20z" fill="currentColor"></path><path d="M12 2h8v2h-8z" fill="currentColor"></path></svg>
249
- delete
250
- </a>
251
- <div class="text-gray-400 flex items-center mr-4"><svg class="text-gray-300 text-sm mr-1.5 -translate-y-px" width="1em" height="1em" viewBox="0 0 22 28" fill="none" xmlns="http://www.w3.org/2000/svg"><path fill-rule="evenodd" clip-rule="evenodd" d="M15.3634 10.3639C15.8486 10.8491 15.8486 11.6357 15.3634 12.1209L10.9292 16.5551C10.6058 16.8785 10.0814 16.8785 9.7579 16.5551L7.03051 13.8277C6.54532 13.3425 6.54532 12.5558 7.03051 12.0707C7.51569 11.5855 8.30234 11.5855 8.78752 12.0707L9.7579 13.041C10.0814 13.3645 10.6058 13.3645 10.9292 13.041L13.6064 10.3639C14.0916 9.8787 14.8782 9.8787 15.3634 10.3639Z" fill="currentColor"></path><path fill-rule="evenodd" clip-rule="evenodd" d="M10.6666 27.12C4.93329 25.28 0 19.2267 0 12.7867V6.52001C0 5.40001 0.693334 4.41334 1.73333 4.01334L9.73333 1.01334C10.3333 0.786673 11 0.786673 11.6 1.02667L19.6 4.02667C20.1083 4.21658 20.5465 4.55701 20.8562 5.00252C21.1659 5.44803 21.3324 5.97742 21.3333 6.52001V12.7867C21.3333 19.24 16.4 25.28 10.6666 27.12Z" fill="currentColor" fill-opacity="0.22"></path><path d="M10.0845 1.94967L10.0867 1.94881C10.4587 1.8083 10.8666 1.81036 11.2286 1.95515L11.2387 1.95919L11.2489 1.963L19.2489 4.963L19.25 4.96342C19.5677 5.08211 19.8416 5.29488 20.0351 5.57333C20.2285 5.85151 20.3326 6.18203 20.3333 6.52082C20.3333 6.52113 20.3333 6.52144 20.3333 6.52176L20.3333 12.7867C20.3333 18.6535 15.8922 24.2319 10.6666 26.0652C5.44153 24.2316 1 18.6409 1 12.7867V6.52001C1 5.82357 1.42893 5.20343 2.08883 4.94803L10.0845 1.94967Z" stroke="currentColor" stroke-opacity="0.30" stroke-width="2"></path></svg>
252
-
253
- No virus
254
- </div>
255
-
256
- <div class="dark:text-gray-300 sm:ml-auto">6.6 kB</div></div>
257
-
258
- <div class="border border-t-0 rounded-b-lg dark:bg-gray-925 dark:border-gray-800 leading-tight"><div class="py-4 px-4 sm:px-6 prose hf-sanitized hf-sanitized-ePMI-0Dy6tnIb6JbaZ2vM"><div class="min-w-full max-h-[300px] transition-all overflow-auto border-b mb-8 -mx-6 -mt-4 px-6 pt-4 pb-5 font-mono text-xs not-prose bg-gradient-to-t from-gray-50 dark:from-gray-900 dark:to-gray-950"><div class="border px-2 py-1 rounded-lg inline-block font-mono text-xs leading-none mb-2">metadata</div>
259
- <pre><!-- HTML_TAG_START --><span class="hljs-attr">language:</span>
260
- <span class="hljs-bullet">-</span> <span class="hljs-string">en</span>
261
- <span class="hljs-attr">library_name:</span> <span class="hljs-string">nemo</span>
262
- <span class="hljs-attr">datasets:</span>
263
- <span class="hljs-bullet">-</span> <span class="hljs-string">librispeech_asr</span>
264
- <span class="hljs-attr">thumbnail:</span> <span class="hljs-literal">null</span>
265
- <span class="hljs-attr">tags:</span>
266
- <span class="hljs-bullet">-</span> <span class="hljs-string">automatic-speech-recognition</span>
267
- <span class="hljs-bullet">-</span> <span class="hljs-string">speech</span>
268
- <span class="hljs-bullet">-</span> <span class="hljs-string">audio</span>
269
- <span class="hljs-bullet">-</span> <span class="hljs-string">CTC</span>
270
- <span class="hljs-bullet">-</span> <span class="hljs-string">Citrinet</span>
271
- <span class="hljs-bullet">-</span> <span class="hljs-string">Transformer</span>
272
- <span class="hljs-bullet">-</span> <span class="hljs-string">pytorch</span>
273
- <span class="hljs-bullet">-</span> <span class="hljs-string">NeMo</span>
274
- <span class="hljs-bullet">-</span> <span class="hljs-string">hf-asr-leaderboard</span>
275
- <span class="hljs-attr">license:</span> <span class="hljs-string">cc-by-4.0</span>
276
- <span class="hljs-attr">widget:</span>
277
- <span class="hljs-bullet">-</span> <span class="hljs-attr">example_title:</span> <span class="hljs-string">Librispeech</span> <span class="hljs-string">sample</span> <span class="hljs-number">1</span>
278
- <span class="hljs-attr">src:</span> <span class="hljs-string">https://cdn-media.huggingface.co/speech_samples/sample1.flac</span>
279
- <span class="hljs-bullet">-</span> <span class="hljs-attr">example_title:</span> <span class="hljs-string">Librispeech</span> <span class="hljs-string">sample</span> <span class="hljs-number">2</span>
280
- <span class="hljs-attr">src:</span> <span class="hljs-string">https://cdn-media.huggingface.co/speech_samples/sample2.flac</span>
281
- <span class="hljs-attr">model-index:</span>
282
- <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">stt_en_citrinet_1024_ls</span>
283
- <span class="hljs-attr">results:</span>
284
- <span class="hljs-bullet">-</span> <span class="hljs-attr">task:</span>
285
- <span class="hljs-attr">name:</span> <span class="hljs-string">Automatic</span> <span class="hljs-string">Speech</span> <span class="hljs-string">Recognition</span>
286
- <span class="hljs-attr">type:</span> <span class="hljs-string">automatic-speech-recognition</span>
287
- <span class="hljs-attr">dataset:</span>
288
- <span class="hljs-attr">name:</span> <span class="hljs-string">LibriSpeech</span> <span class="hljs-string">(clean)</span>
289
- <span class="hljs-attr">type:</span> <span class="hljs-string">librispeech_asr</span>
290
- <span class="hljs-attr">config:</span> <span class="hljs-string">clean</span>
291
- <span class="hljs-attr">split:</span> <span class="hljs-string">test</span>
292
- <span class="hljs-attr">args:</span>
293
- <span class="hljs-attr">language:</span> <span class="hljs-string">en</span>
294
- <span class="hljs-attr">metrics:</span>
295
- <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Test</span> <span class="hljs-string">WER</span>
296
- <span class="hljs-attr">type:</span> <span class="hljs-string">wer</span>
297
- <span class="hljs-attr">value:</span> <span class="hljs-number">2.5</span>
298
- <span class="hljs-bullet">-</span> <span class="hljs-attr">task:</span>
299
- <span class="hljs-attr">type:</span> <span class="hljs-string">Automatic</span> <span class="hljs-string">Speech</span> <span class="hljs-string">Recognition</span>
300
- <span class="hljs-attr">name:</span> <span class="hljs-string">automatic-speech-recognition</span>
301
- <span class="hljs-attr">dataset:</span>
302
- <span class="hljs-attr">name:</span> <span class="hljs-string">LibriSpeech</span> <span class="hljs-string">(other)</span>
303
- <span class="hljs-attr">type:</span> <span class="hljs-string">librispeech_asr</span>
304
- <span class="hljs-attr">config:</span> <span class="hljs-string">other</span>
305
- <span class="hljs-attr">split:</span> <span class="hljs-string">test</span>
306
- <span class="hljs-attr">args:</span>
307
- <span class="hljs-attr">language:</span> <span class="hljs-string">en</span>
308
- <span class="hljs-attr">metrics:</span>
309
- <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">Test</span> <span class="hljs-string">WER</span>
310
- <span class="hljs-attr">type:</span> <span class="hljs-string">wer</span>
311
- <span class="hljs-attr">value:</span> <span class="hljs-number">6.3</span>
312
- <!-- HTML_TAG_END --></pre></div>
313
- <!-- HTML_TAG_START --><h1 class="relative group flex items-center">
314
- <a rel="noopener nofollow" href="#nvidia-citrinet-ctc-1924-librispeech-en-us" class="block pr-1.5 text-lg with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" id="nvidia-citrinet-ctc-1924-librispeech-en-us">
315
- <span class="header-link"><svg viewBox="0 0 256 256" preserveAspectRatio="xMidYMid meet" height="1em" width="1em" role="img" aria-hidden="true" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" class="text-gray-500 hover:text-black w-4"><path fill="currentColor" d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z"></path></svg></span>
316
- </a>
317
- <span>
318
- NVIDIA Citrinet CTC 1924 Librispeech (en-US)
319
- </span>
320
- </h1>
321
-
322
-
323
- <p>| <a rel="noopener nofollow" href="#model-architecture"><img alt="Model architecture" src="https://img.shields.io/badge/Model_Arch-Citrinet--CTC-lightgrey#model-badge"></a>
324
- | <a rel="noopener nofollow" href="#model-architecture"><img alt="Model size" src="https://img.shields.io/badge/Params-140M-lightgrey#model-badge"></a>
325
- | <a rel="noopener nofollow" href="#datasets"><img alt="Language" src="https://img.shields.io/badge/Language-en--US-lightgrey#model-badge"></a>
326
- | <a rel="noopener nofollow" href="#deployment-with-nvidia-riva"><img alt="Riva Compatible" src="https://img.shields.io/badge/NVIDIA%20Riva-compatible-brightgreen#model-badge"></a> |</p>
327
- <p>This model transcribes speech in lower case English alphabet along with spaces and apostrophes.
328
- It is an "large" versions of Citrinet-CTC (around 140M parameters) model.<br>See the <a rel="noopener nofollow" href="#model-architecture">model architecture</a> section and <a rel="noopener nofollow" href="https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#citrinet">NeMo documentation</a> for complete architecture details.
329
- It is also compatible with NVIDIA Riva for <a rel="noopener nofollow" href="#deployment-with-nvidia-riva">production-grade server deployments</a>. </p>
330
- <h2 class="relative group flex items-center">
331
- <a rel="noopener nofollow" href="#nvidia-nemo-training" class="block pr-1.5 text-lg with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" id="nvidia-nemo-training">
332
- <span class="header-link"><svg viewBox="0 0 256 256" preserveAspectRatio="xMidYMid meet" height="1em" width="1em" role="img" aria-hidden="true" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" class="text-gray-500 hover:text-black w-4"><path fill="currentColor" d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z"></path></svg></span>
333
- </a>
334
- <span>
335
- NVIDIA NeMo: Training
336
- </span>
337
- </h2>
338
- <p>To train, fine-tune or play with the model you will need to install <a rel="noopener nofollow" href="https://github.com/NVIDIA/NeMo">NVIDIA NeMo</a>. We recommend you install it after you've installed latest Pytorch version.</p>
339
- <pre><code>pip install nemo_toolkit['all']
340
- </code></pre>
341
- <h2 class="relative group flex items-center">
342
- <a rel="noopener nofollow" href="#how-to-use-this-model" class="block pr-1.5 text-lg with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" id="how-to-use-this-model">
343
- <span class="header-link"><svg viewBox="0 0 256 256" preserveAspectRatio="xMidYMid meet" height="1em" width="1em" role="img" aria-hidden="true" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" class="text-gray-500 hover:text-black w-4"><path fill="currentColor" d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z"></path></svg></span>
344
- </a>
345
- <span>
346
- How to Use this Model
347
- </span>
348
- </h2>
349
- <p>The model is available for use in the NeMo toolkit [3], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.</p>
350
- <h3 class="relative group flex items-center">
351
- <a rel="noopener nofollow" href="#automatically-instantiate-the-model" class="block pr-1.5 text-lg with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" id="automatically-instantiate-the-model">
352
- <span class="header-link"><svg viewBox="0 0 256 256" preserveAspectRatio="xMidYMid meet" height="1em" width="1em" role="img" aria-hidden="true" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" class="text-gray-500 hover:text-black w-4"><path fill="currentColor" d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z"></path></svg></span>
353
- </a>
354
- <span>
355
- Automatically instantiate the model
356
- </span>
357
- </h3>
358
- <pre><code class="language-python"><span class="hljs-keyword">import</span> nemo.collections.asr <span class="hljs-keyword">as</span> nemo_asr
359
- asr_model = nemo_asr.models.EncDecCTCModelBPE.from_pretrained(<span class="hljs-string">"nvidia/stt_en_citrinet_1024_ls"</span>)
360
- </code></pre>
361
- <h3 class="relative group flex items-center">
362
- <a rel="noopener nofollow" href="#transcribing-using-python" class="block pr-1.5 text-lg with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" id="transcribing-using-python">
363
- <span class="header-link"><svg viewBox="0 0 256 256" preserveAspectRatio="xMidYMid meet" height="1em" width="1em" role="img" aria-hidden="true" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" class="text-gray-500 hover:text-black w-4"><path fill="currentColor" d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z"></path></svg></span>
364
- </a>
365
- <span>
366
- Transcribing using Python
367
- </span>
368
- </h3>
369
- <p>First, let's get a sample</p>
370
- <pre><code>wget https://dldata-public.s3.us-east-2.amazonaws.com/2086-149220-0033.wav
371
- </code></pre>
372
- <p>Then simply do:</p>
373
- <pre><code>asr_model.transcribe(['2086-149220-0033.wav'])
374
- </code></pre>
375
- <h3 class="relative group flex items-center">
376
- <a rel="noopener nofollow" href="#transcribing-many-audio-files" class="block pr-1.5 text-lg with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" id="transcribing-many-audio-files">
377
- <span class="header-link"><svg viewBox="0 0 256 256" preserveAspectRatio="xMidYMid meet" height="1em" width="1em" role="img" aria-hidden="true" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" class="text-gray-500 hover:text-black w-4"><path fill="currentColor" d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z"></path></svg></span>
378
- </a>
379
- <span>
380
- Transcribing many audio files
381
- </span>
382
- </h3>
383
- <pre><code class="language-shell">python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py
384
  pretrained_name="nvidia/stt_en_citrinet_1024_ls"
385
- audio_dir="&lt;DIRECTORY CONTAINING AUDIO FILES&gt;"
386
- </code></pre>
387
- <h3 class="relative group flex items-center">
388
- <a rel="noopener nofollow" href="#input" class="block pr-1.5 text-lg with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" id="input">
389
- <span class="header-link"><svg viewBox="0 0 256 256" preserveAspectRatio="xMidYMid meet" height="1em" width="1em" role="img" aria-hidden="true" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" class="text-gray-500 hover:text-black w-4"><path fill="currentColor" d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z"></path></svg></span>
390
- </a>
391
- <span>
392
- Input
393
- </span>
394
- </h3>
395
- <p>This model accepts 16000 KHz Mono-channel Audio (wav files) as input.</p>
396
- <h3 class="relative group flex items-center">
397
- <a rel="noopener nofollow" href="#output" class="block pr-1.5 text-lg with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" id="output">
398
- <span class="header-link"><svg viewBox="0 0 256 256" preserveAspectRatio="xMidYMid meet" height="1em" width="1em" role="img" aria-hidden="true" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" class="text-gray-500 hover:text-black w-4"><path fill="currentColor" d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z"></path></svg></span>
399
- </a>
400
- <span>
401
- Output
402
- </span>
403
- </h3>
404
- <p>This model provides transcribed speech as a string for a given audio sample.</p>
405
- <h2 class="relative group flex items-center">
406
- <a rel="noopener nofollow" href="#model-architecture" class="block pr-1.5 text-lg with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" id="model-architecture">
407
- <span class="header-link"><svg viewBox="0 0 256 256" preserveAspectRatio="xMidYMid meet" height="1em" width="1em" role="img" aria-hidden="true" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" class="text-gray-500 hover:text-black w-4"><path fill="currentColor" d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z"></path></svg></span>
408
- </a>
409
- <span>
410
- Model Architecture
411
- </span>
412
- </h2>
413
- <p>Citrinet-CTC model is an autoregressive variant of Citrinet model [1] for Automatic Speech Recognition which uses CTC loss/decoding instead of Transducer Loss. You may find more info on the detail of this model here: <a rel="noopener nofollow" href="https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html">Citrinet Model</a>. </p>
414
- <h2 class="relative group flex items-center">
415
- <a rel="noopener nofollow" href="#training" class="block pr-1.5 text-lg with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" id="training">
416
- <span class="header-link"><svg viewBox="0 0 256 256" preserveAspectRatio="xMidYMid meet" height="1em" width="1em" role="img" aria-hidden="true" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" class="text-gray-500 hover:text-black w-4"><path fill="currentColor" d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z"></path></svg></span>
417
- </a>
418
- <span>
419
- Training
420
- </span>
421
- </h2>
422
- <p>The NeMo toolkit [3] was used for training the models for over several hundred epochs. These model are trained with this <a rel="noopener nofollow" href="https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_ctc/speech_to_text_ctc_bpe.py">example script</a> and this <a rel="noopener nofollow" href="https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/citrinet/citrinet_1024.yaml">base config</a> (Note: Change the <code>model.model_defaults.filters</code> to match the model size).</p>
423
- <p>The tokenizers for these models were built using the text transcripts of the train set with this <a rel="noopener nofollow" href="https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py">script</a>.</p>
424
- <h3 class="relative group flex items-center">
425
- <a rel="noopener nofollow" href="#datasets" class="block pr-1.5 text-lg with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" id="datasets">
426
- <span class="header-link"><svg viewBox="0 0 256 256" preserveAspectRatio="xMidYMid meet" height="1em" width="1em" role="img" aria-hidden="true" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" class="text-gray-500 hover:text-black w-4"><path fill="currentColor" d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z"></path></svg></span>
427
- </a>
428
- <span>
429
- Datasets
430
- </span>
431
- </h3>
432
- <p>All the models in this collection are trained on a just the Librispeech Dataset:</p>
433
- <ul>
434
- <li>Librispeech 960 hours of English speech</li>
435
- </ul>
436
- <h2 class="relative group flex items-center">
437
- <a rel="noopener nofollow" href="#performance" class="block pr-1.5 text-lg with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" id="performance">
438
- <span class="header-link"><svg viewBox="0 0 256 256" preserveAspectRatio="xMidYMid meet" height="1em" width="1em" role="img" aria-hidden="true" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" class="text-gray-500 hover:text-black w-4"><path fill="currentColor" d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z"></path></svg></span>
439
- </a>
440
- <span>
441
- Performance
442
- </span>
443
- </h2>
444
- <p>The list of the available models in this collection is shown in the following table. Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding.</p>
445
- <div class="max-w-full overflow-auto">
446
- <table>
447
- <thead><tr>
448
- <th>Version</th>
449
- <th>Tokenizer</th>
450
- <th>Vocabulary Size</th>
451
- <th>LS test-other</th>
452
- <th>LS test-clean</th>
453
- </tr>
454
-
455
- </thead><tbody><tr>
456
- <td>1.0.0</td>
457
- <td>SentencePiece Unigram [2]</td>
458
- <td>256</td>
459
- <td>6.3</td>
460
- <td>2.5</td>
461
- </tr>
462
- </tbody>
463
- </table>
464
- </div>
465
- <h2 class="relative group flex items-center">
466
- <a rel="noopener nofollow" href="#limitations" class="block pr-1.5 text-lg with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" id="limitations">
467
- <span class="header-link"><svg viewBox="0 0 256 256" preserveAspectRatio="xMidYMid meet" height="1em" width="1em" role="img" aria-hidden="true" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" class="text-gray-500 hover:text-black w-4"><path fill="currentColor" d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z"></path></svg></span>
468
- </a>
469
- <span>
470
- Limitations
471
- </span>
472
- </h2>
473
- <p>Since this model was trained on publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.</p>
474
- <h2 class="relative group flex items-center">
475
- <a rel="noopener nofollow" href="#deployment-with-nvidia-riva" class="block pr-1.5 text-lg with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" id="deployment-with-nvidia-riva">
476
- <span class="header-link"><svg viewBox="0 0 256 256" preserveAspectRatio="xMidYMid meet" height="1em" width="1em" role="img" aria-hidden="true" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" class="text-gray-500 hover:text-black w-4"><path fill="currentColor" d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z"></path></svg></span>
477
- </a>
478
- <span>
479
- Deployment with NVIDIA Riva
480
- </span>
481
- </h2>
482
- <p>For the best real-time accuracy, latency, and throughput, deploy the model with <a rel="noopener nofollow" href="https://developer.nvidia.com/riva">NVIDIA Riva</a>, an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, at the edge, and embedded.
483
- Additionally, Riva provides: </p>
484
- <ul>
485
- <li>World-class out-of-the-box accuracy for the most common languages with model checkpoints trained on proprietary data with hundreds of thousands of GPU-compute hours </li>
486
- <li>Best in class accuracy with run-time word boosting (e.g., brand and product names) and customization of acoustic model, language model, and inverse text normalization </li>
487
- <li>Streaming speech recognition, Kubernetes compatible scaling, and Enterprise-grade support</li>
488
- </ul>
489
- <p>Check out <a rel="noopener nofollow" href="https://developer.nvidia.com/riva#demos">Riva live demo</a>.</p>
490
- <h2 class="relative group flex items-center">
491
- <a rel="noopener nofollow" href="#references" class="block pr-1.5 text-lg with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" id="references">
492
- <span class="header-link"><svg viewBox="0 0 256 256" preserveAspectRatio="xMidYMid meet" height="1em" width="1em" role="img" aria-hidden="true" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" class="text-gray-500 hover:text-black w-4"><path fill="currentColor" d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z"></path></svg></span>
493
- </a>
494
- <span>
495
- References
496
- </span>
497
- </h2>
498
- <p>[1] <a rel="noopener nofollow" href="https://arxiv.org/abs/2104.01721"> Citrinet: Closing the Gap between Non-Autoregressive and Autoregressive End-to-End Models for Automatic Speech Recognition</a>
499
- [2] <a rel="noopener nofollow" href="https://github.com/google/sentencepiece">Google Sentencepiece Tokenizer</a>
500
- [3] <a rel="noopener nofollow" href="https://github.com/NVIDIA/NeMo">NVIDIA NeMo Toolkit</a></p>
501
- <h2 class="relative group flex items-center">
502
- <a rel="noopener nofollow" href="#licence" class="block pr-1.5 text-lg with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" id="licence">
503
- <span class="header-link"><svg viewBox="0 0 256 256" preserveAspectRatio="xMidYMid meet" height="1em" width="1em" role="img" aria-hidden="true" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" class="text-gray-500 hover:text-black w-4"><path fill="currentColor" d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z"></path></svg></span>
504
- </a>
505
- <span>
506
- Licence
507
- </span>
508
- </h2>
509
- <p>License to use this model is covered by the <a rel="noopener nofollow" href="https://creativecommons.org/licenses/by/4.0/">CC-BY-4.0</a>. By downloading the public and release version of the model, you accept the terms and conditions of the <a rel="noopener nofollow" href="https://creativecommons.org/licenses/by/4.0/">CC-BY-4.0</a> license.</p>
510
- <style>.hf-sanitized.hf-sanitized-ePMI-0Dy6tnIb6JbaZ2vM img {display: inline;}</style><!-- HTML_TAG_END --></div></div></section></div></main>
511
- </div>
512
-
513
- <script>
514
- import("/front/build/index.c5ff23a02.js");
515
- window.moonSha = ".c5ff23a02";
516
- </script>
517
-
518
- <script>
519
- if (
520
- !(
521
- ["localhost", "huggingface.test"].includes(
522
- window.location.hostname
523
- ) || window.location.hostname.includes("ngrok.io")
524
- )
525
- ) {
526
- (function (i, s, o, g, r, a, m) {
527
- i["GoogleAnalyticsObject"] = r;
528
- (i[r] =
529
- i[r] ||
530
- function () {
531
- (i[r].q = i[r].q || []).push(arguments);
532
- }),
533
- (i[r].l = 1 * new Date());
534
- (a = s.createElement(o)), (m = s.getElementsByTagName(o)[0]);
535
- a.async = 1;
536
- a.src = g;
537
- m.parentNode.insertBefore(a, m);
538
- })(
539
- window,
540
- document,
541
- "script",
542
- "https://www.google-analytics.com/analytics.js",
543
- "ganalytics"
544
- );
545
- ganalytics("create", "UA-83738774-2", "auto");
546
- ganalytics("send", "pageview");
547
- }
548
- </script>
549
- </body>
550
- </html>
 
1
+ ---
2
+ language:
3
+ - en
4
+ library_name: nemo
5
+ datasets:
6
+ - librispeech_asr
7
+ thumbnail: null
8
+ tags:
9
+ - automatic-speech-recognition
10
+ - speech
11
+ - audio
12
+ - CTC
13
+ - Citrinet
14
+ - Transformer
15
+ - pytorch
16
+ - NeMo
17
+ - hf-asr-leaderboard
18
+ license: cc-by-4.0
19
+ widget:
20
+ - example_title: Librispeech sample 1
21
+ src: https://cdn-media.huggingface.co/speech_samples/sample1.flac
22
+ - example_title: Librispeech sample 2
23
+ src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
24
+ model-index:
25
+ - name: stt_en_citrinet_1024_ls
26
+ results:
27
+ - task:
28
+ name: Automatic Speech Recognition
29
+ type: automatic-speech-recognition
30
+ dataset:
31
+ name: LibriSpeech (clean)
32
+ type: librispeech_asr
33
+ config: clean
34
+ split: test
35
+ args:
36
+ language: en
37
+ metrics:
38
+ - name: Test WER
39
+ type: wer
40
+ value: 2.5
41
+ - task:
42
+ type: Automatic Speech Recognition
43
+ name: automatic-speech-recognition
44
+ dataset:
45
+ name: LibriSpeech (other)
46
+ type: librispeech_asr
47
+ config: other
48
+ split: test
49
+ args:
50
+ language: en
51
+ metrics:
52
+ - name: Test WER
53
+ type: wer
54
+ value: 6.3
55
+ ---
56
+
57
+ # NVIDIA Citrinet CTC 1924 Librispeech (en-US)
58
+
59
+ <style>
60
+ img {
61
+ display: inline;
62
+ }
63
+ </style>
64
+
65
+ | [![Model architecture](https://img.shields.io/badge/Model_Arch-Citrinet--CTC-lightgrey#model-badge)](#model-architecture)
66
+ | [![Model size](https://img.shields.io/badge/Params-140M-lightgrey#model-badge)](#model-architecture)
67
+ | [![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets)
68
+ | [![Riva Compatible](https://img.shields.io/badge/NVIDIA%20Riva-compatible-brightgreen#model-badge)](#deployment-with-nvidia-riva) |
69
+
70
+
71
+ This model transcribes speech in lower case English alphabet along with spaces and apostrophes.
72
+ It is an "large" versions of Citrinet-CTC (around 140M parameters) model.
73
+ See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#citrinet) for complete architecture details.
74
+ It is also compatible with NVIDIA Riva for [production-grade server deployments](#deployment-with-nvidia-riva).
75
+
76
+ ## NVIDIA NeMo: Training
77
+
78
+ To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest Pytorch version.
79
+ ```
80
+ pip install nemo_toolkit['all']
81
+ ```
82
+
83
+ ## How to Use this Model
84
+
85
+ The model is available for use in the NeMo toolkit [3], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
86
+
87
+ ### Automatically instantiate the model
88
+
89
+ ```python
90
+ import nemo.collections.asr as nemo_asr
91
+ asr_model = nemo_asr.models.EncDecCTCModelBPE.from_pretrained("nvidia/stt_en_citrinet_1024_ls")
92
+ ```
93
+
94
+ ### Transcribing using Python
95
+ First, let's get a sample
96
+ ```
97
+ wget https://dldata-public.s3.us-east-2.amazonaws.com/2086-149220-0033.wav
98
+ ```
99
+ Then simply do:
100
+ ```
101
+ asr_model.transcribe(['2086-149220-0033.wav'])
102
+ ```
103
+
104
+ ### Transcribing many audio files
105
+
106
+ ```shell
107
+ python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
108
  pretrained_name="nvidia/stt_en_citrinet_1024_ls"
109
+ audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"
110
+ ```
111
+
112
+ ### Input
113
+
114
+ This model accepts 16000 KHz Mono-channel Audio (wav files) as input.
115
+
116
+ ### Output
117
+
118
+ This model provides transcribed speech as a string for a given audio sample.
119
+
120
+ ## Model Architecture
121
+
122
+ Citrinet-CTC model is an autoregressive variant of Citrinet model [1] for Automatic Speech Recognition which uses CTC loss/decoding instead of Transducer Loss. You may find more info on the detail of this model here: [Citrinet Model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html).
123
+
124
+ ## Training
125
+
126
+ The NeMo toolkit [3] was used for training the models for over several hundred epochs. These model are trained with this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_ctc/speech_to_text_ctc_bpe.py) and this [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/citrinet/citrinet_1024.yaml) (Note: Change the `model.model_defaults.filters` to match the model size).
127
+
128
+ The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
129
+
130
+ ### Datasets
131
+
132
+ All the models in this collection are trained on a just the Librispeech Dataset:
133
+
134
+ - Librispeech 960 hours of English speech
135
+
136
+
137
+ ## Performance
138
+
139
+ The list of the available models in this collection is shown in the following table. Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding.
140
+
141
+ | Version | Tokenizer | Vocabulary Size | LS test-other | LS test-clean |
142
+ |---------|---------------------------|-----------------|---------------|---------------|
143
+ | 1.0.0 | SentencePiece Unigram [2] | 256 | 6.3 | 2.5 |
144
+
145
+ ## Limitations
146
+ Since this model was trained on publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
147
+
148
+ ## Deployment with NVIDIA Riva
149
+
150
+ For the best real-time accuracy, latency, and throughput, deploy the model with [NVIDIA Riva](https://developer.nvidia.com/riva), an accelerated speech AI SDK deployable on-prem, in all clouds, multi-cloud, hybrid, at the edge, and embedded.
151
+ Additionally, Riva provides:
152
+
153
+ * World-class out-of-the-box accuracy for the most common languages with model checkpoints trained on proprietary data with hundreds of thousands of GPU-compute hours
154
+ * Best in class accuracy with run-time word boosting (e.g., brand and product names) and customization of acoustic model, language model, and inverse text normalization
155
+ * Streaming speech recognition, Kubernetes compatible scaling, and Enterprise-grade support
156
+
157
+ Check out [Riva live demo](https://developer.nvidia.com/riva#demos).
158
+
159
+ ## References
160
+ [1] [ Citrinet: Closing the Gap between Non-Autoregressive and Autoregressive End-to-End Models for Automatic Speech Recognition](https://arxiv.org/abs/2104.01721)
161
+ [2] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)
162
+ [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
163
+
164
+ ## Licence
165
+
166
+ License to use this model is covered by the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/). By downloading the public and release version of the model, you accept the terms and conditions of the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license.