sanchit-gandhi HF staff commited on
Commit
476f0f4
1 Parent(s): 415312a

Correct scripts

Browse files
Files changed (3) hide show
  1. README.md +5 -4
  2. get_ctc_tokenizer.py +1 -1
  3. run_tedlium.sh +2 -2
README.md CHANGED
@@ -2,18 +2,19 @@
2
  language:
3
  - en
4
  tags:
5
- - esc
6
  datasets:
7
- - tedlium
 
8
  ---
9
 
10
  To reproduce this run, first call `get_ctc_tokenizer.py` to train the CTC tokenizer and then execute the following command to train the CTC system:
11
  ```python
12
  #!/usr/bin/env bash
13
  python run_flax_speech_recognition_ctc.py \
14
- --model_name_or_path="esc-benchmark/wav2vec2-ctc-pretrained" \
15
  --tokenizer_name="wav2vec2-ctc-tedlium-tokenizer" \
16
- --dataset_name="esc-benchmark/esc-datasets" \
17
  --dataset_config_name="tedlium" \
18
  --output_dir="./" \
19
  --wandb_project="wav2vec2-ctc" \
2
  language:
3
  - en
4
  tags:
5
+ - esb
6
  datasets:
7
+ - esb/datasets
8
+ - LIUM/tedlium
9
  ---
10
 
11
  To reproduce this run, first call `get_ctc_tokenizer.py` to train the CTC tokenizer and then execute the following command to train the CTC system:
12
  ```python
13
  #!/usr/bin/env bash
14
  python run_flax_speech_recognition_ctc.py \
15
+ --model_name_or_path="esb/wav2vec2-ctc-pretrained" \
16
  --tokenizer_name="wav2vec2-ctc-tedlium-tokenizer" \
17
+ --dataset_name="esb/datasets" \
18
  --dataset_config_name="tedlium" \
19
  --output_dir="./" \
20
  --wandb_project="wav2vec2-ctc" \
get_ctc_tokenizer.py CHANGED
@@ -19,7 +19,7 @@ tokenizer_name = f"wav2vec2-ctc-{dataset_name}-tokenizer"
19
  cutoff_freq = 0.01
20
 
21
  dataset = load_dataset(
22
- "esc-benchmark/esc-datasets",
23
  dataset_name,
24
  split=split,
25
  use_auth_token=use_auth_token,
19
  cutoff_freq = 0.01
20
 
21
  dataset = load_dataset(
22
+ "esb/datasets",
23
  dataset_name,
24
  split=split,
25
  use_auth_token=use_auth_token,
run_tedlium.sh CHANGED
@@ -1,8 +1,8 @@
1
  #!/usr/bin/env bash
2
  python run_flax_speech_recognition_ctc.py \
3
- --model_name_or_path="esc-benchmark/wav2vec2-ctc-pretrained" \
4
  --tokenizer_name="wav2vec2-ctc-tedlium-tokenizer" \
5
- --dataset_name="esc-benchmark/esc-datasets" \
6
  --dataset_config_name="tedlium" \
7
  --output_dir="./" \
8
  --wandb_project="wav2vec2-ctc" \
1
  #!/usr/bin/env bash
2
  python run_flax_speech_recognition_ctc.py \
3
+ --model_name_or_path="esb/wav2vec2-ctc-pretrained" \
4
  --tokenizer_name="wav2vec2-ctc-tedlium-tokenizer" \
5
+ --dataset_name="esb/datasets" \
6
  --dataset_config_name="tedlium" \
7
  --output_dir="./" \
8
  --wandb_project="wav2vec2-ctc" \