Upload seamless_communication/cli/m4t/evaluate/README.md with huggingface_hub
Browse files
seamless_communication/cli/m4t/evaluate/README.md
ADDED
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Evaluating SeamlessM4T models
|
2 |
+
|
3 |
+
Refer to the [SeamlessM4T README](../../../../../docs/m4t) for an overview of the M4T models.
|
4 |
+
|
5 |
+
Refer to the [inference README](../predict/README.md) for how to run inference with SeamlessM4T models.
|
6 |
+
|
7 |
+
## Quick start:
|
8 |
+
We use SACREBLEU library for computing BLEU scores and [JiWER library](https://github.com/jitsi/jiwer) is used to compute these CER and WER scores.
|
9 |
+
|
10 |
+
Evaluation can be run with the CLI, from the root directory of the repository.
|
11 |
+
|
12 |
+
The model can be specified with `--model_name`: `seamlessM4T_v2_large` or `seamlessM4T_large` or `seamlessM4T_medium`
|
13 |
+
|
14 |
+
```bash
|
15 |
+
m4t_evaluate --data_file <path_to_data_tsv_file> --task <task_name> --tgt_lang <tgt_lang> --output_path <path_to_save_evaluation_output> --ref_field <ref_field_name> --audio_root_dir <path_to_audio_root_directory>
|
16 |
+
```
|
17 |
+
## Note
|
18 |
+
1. We use raw (unnormalized) references to compute BLEU scores for S2TT, T2TT tasks.
|
19 |
+
2. For ASR task, src_lang needs to be passed as <tgt_lang>
|
20 |
+
3. `--src_lang` arg needs to be specified to run evaluation for T2TT task
|
21 |
+
|