ThiagoCF05 commited on
Commit
0c22a8e
1 Parent(s): a848990

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -0
README.md ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ - fr
6
+ - es
7
+ - pt
8
+ - de
9
+ ---
10
+ # Model Card for NoRef-ER
11
+
12
+ Referenceless Error Metric for Automatic Speech Recognition
13
+ via Contrastive Fine-Tuning of mMiniLMv2 without References
14
+
15
+ # Model Details
16
+
17
+ ## How to use
18
+
19
+ ```python
20
+ from transformers import AutoTokenizer, AutoModel
21
+
22
+ tokenizer = AutoTokenizer.from_pretrained("aixplain/NoRef-ER")
23
+ model = AutoModel.from_pretrained("aixplain/NoRef-ER")
24
+
25
+ tokens = tokenizer([
26
+ "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced.",
27
+ "In Italy, pizzas serves in formal settings, such as at an restaurant, is presented unslicing."
28
+ ], padding=True, return_tensors="pt")
29
+ scores = model.infer(**tokens)
30
+ ```
31
+
32
+ ## Model Description
33
+
34
+ This work presents a novel multi-language referenceless quality metric for automatic speech recognition (ASR). The metric is based on a language model (LM) trained with contrastive-learning and without references indicating the quality. Instead, the known order of quality in-between increasing compression levels of the same ASR model is used for self-supervision. All unique pair combinations are extracted from the outputs of ASR models in multiple compression levels to compile a dataset for model training and validation. The LM is part of a siamese network architecture (with shared weights) for giving pair-wise ranking decisions considering the ASR output quality. The referenceless metric achieves 77% validation accuracy in this pair-wise ranking task and can generalize for quality comparisons in-between different ASR models. When experimented on a blind test dataset consisting of outputs of top commercial ASR engines, it has been observed that the referenceless metric has a 36% correlation with word-error-rate (WER) ranks of them across samples, and can outperform the best engine's WER by 7-8% via selecting among alternative hypotheses. The referenceless metric is compared against the perplexity metric from various state-of-art pre-trained LM(s) and obtained superior performance in all experiments. The referenceless metric allows comparing the performance of different ASR models on a speech dataset that lacks ground-truth references. It also enables obtaining an ensemble of ASR models that can outperform any individual model in the ensemble. Finally, it can be used to prioritize hypotheses for referencing (via post-editing) or human-evaluation processes within ASR model improvement lifecycle in production, and for A/B testing different versions of an ASR model (such as previous and current) on a referenceless production data stream.
35
+
36
+
37
+
38
+ - **Developed by:** aiXplain AI Lab
39
+ - **Language(s) (NLP):** English, French, Spanish, Portuguese, German
40
+ - **License:** MIT License
41
+ - **Finetuned from model [optional]:** nreimers/mMiniLMv2-L12-H384-distilled-from-XLMR-Large