Add docs for beam search decoding

#3
by vineelpratap - opened
Files changed (1) hide show
  1. README.md +66 -0
README.md CHANGED
@@ -260,6 +260,72 @@ In the same way the language can be switched out for all other supported languag
260
  processor.tokenizer.vocab.keys()
261
  ```
262
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
263
  For more details, please have a look at [the official docs](https://huggingface.co/docs/transformers/main/en/model_doc/mms).
264
 
265
  ## Supported Languages
 
260
  processor.tokenizer.vocab.keys()
261
  ```
262
 
263
+ *Beam Search Decoding with Language Model*
264
+
265
+ To run decoding with n-gram language model, download the decoding config which consits of the paths to language model, lexicon, token files and also the best decoding hyperparameters. Language models are only avaialble for the 102 languages of the FLEURS dataset.
266
+
267
+ ```py
268
+ import json
269
+
270
+ lm_decoding_config = {}
271
+ lm_decoding_configfile = hf_hub_download(
272
+ repo_id="facebook/mms-cclms",
273
+ filename="decoding_config.json",
274
+ subfolder="mms-1b-all",
275
+ )
276
+ with open(lm_decoding_configfile) as f:
277
+ lm_decoding_config = json.loads(f.read())
278
+ ```
279
+
280
+ Now, download all the files needed for decoding.
281
+ ```py
282
+ # modify the ISO language code if using a different language.
283
+ decoding_config = lm_decoding_config["eng"]
284
+
285
+ lm_file = hf_hub_download(
286
+ repo_id="facebook/mms-cclms",
287
+ filename=decoding_config["lmfile"].rsplit("/", 1)[1],
288
+ subfolder=decoding_config["lmfile"].rsplit("/", 1)[0],
289
+ )
290
+ token_file = hf_hub_download(
291
+ repo_id="facebook/mms-cclms",
292
+ filename=decoding_config["tokensfile"].rsplit("/", 1)[1],
293
+ subfolder=decoding_config["tokensfile"].rsplit("/", 1)[0],
294
+ )
295
+ lexicon_file = None
296
+ if decoding_config["lexiconfile"] is not None:
297
+ lexicon_file = hf_hub_download(
298
+ repo_id="facebook/mms-cclms",
299
+ filename=decoding_config["lexiconfile"].rsplit("/", 1)[1],
300
+ subfolder=decoding_config["lexiconfile"].rsplit("/", 1)[0],
301
+ )
302
+ ```
303
+
304
+ Create the `torchaudio.models.decoder.CTCDecoder` object
305
+
306
+ ```py
307
+ from torchaudio.models.decoder import ctc_decoder
308
+ beam_search_decoder = ctc_decoder(
309
+ lexicon=lexicon_file,
310
+ tokens=token_file,
311
+ lm=lm_file,
312
+ nbest=1,
313
+ beam_size=500,
314
+ beam_size_token=50,
315
+ lm_weight=float(decoding_config["lmweight"]),
316
+ word_score=float(decoding_config["wordscore"]),
317
+ sil_score=float(decoding_config["silweight"]),
318
+ blank_token="<s>",
319
+ )
320
+
321
+ ```
322
+
323
+ Passing the model output to the ctc decoder will return the transcription.
324
+
325
+ ```py
326
+ beam_search_result = beam_search_decoder(outputs.to("cpu"))
327
+ transcription = " ".join(beam_search_result[0][0].words).strip()
328
+ ```
329
  For more details, please have a look at [the official docs](https://huggingface.co/docs/transformers/main/en/model_doc/mms).
330
 
331
  ## Supported Languages