SpeechTek
/

EE-Wav2Vec2

Automatic Speech Recognition

Transformers

English

Inference Endpoints

Model card Files Files and versions Community

mnabihali commited on Dec 13, 2024

Commit

a251204

verified ·

1 Parent(s): b4cd64d

Update README.md

Browse files

Files changed (1) hide show

README.md +44 -85

README.md CHANGED Viewed

@@ -14,120 +14,79 @@ library_name: transformers
 <img src="./EE.gif" align="center" width="70%">
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).
 ## Model Details
 ### Model Description
-<!-- Provide a longer summary of what this model is. -->
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
 ### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
 ## Training Details
 ### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
 ### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
 #### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
 ## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
 ### Results
-[More Information Needed]
-#### Summary
-## Citation [optional]
 ## Citation

 <img src="./EE.gif" align="center" width="70%">
 ## Model Details
 ### Model Description
+Wav2Vec2.0 model trained with Early-Exit pipeline.
+- **Developed by:** SpeectTek unit, Fondazione Bruno Kessler
+- **Model type:** Wav2Vec 2.0
+- **Language(s) (NLP):** English
+- **Finetuned from model:** facebook/wav2vec2-base-960h
+- **Repository:** https://github.com/augustgw/wav2vec2-ee
+- **Paper:** Training early-exit architectures for automatic speech recognition: Fine-tuning pre-trained models or training from scratch
 ### Downstream Use [optional]
+The model is trained for computationally efficient ASR tasks.
 ## Training Details
 ### Training Data
+The model is trained using the LibriSpeech-960h dataset.
 ### Training Procedure
+### Basic training
+- Fine-tuning with only EE loss: `finetune_ee.py`
+- Fine-tuning a model without early exits: `finetune_non-ee.py`
+- Change `model_config = Wav2Vec2Config(num_hidden_layers=X)` to set the number of layers in the encoder. E.g., for 4-layer encoder: `model_config = Wav2Vec2Config(num_hidden_layers=4)`
 #### Training Hyperparameters
+`training_args = TrainingArguments(
+    output_dir="./wav2vec2-ee/checkpoints/",
+    evaluation_strategy="no",
+    #eval_steps=1000,
+    save_strategy = 'epoch',
+    #eval_accumulation_steps=10,
+    learning_rate=1e-4,
+    per_device_train_batch_size=16,
+    per_device_eval_batch_size=1,
+    num_train_epochs=100,
+    weight_decay=0.01,
+    push_to_hub=False,
+    report_to='wandb',
+    logging_strategy='steps',
+    logging_steps=1000,
+    dataloader_num_workers=1,
+    ignore_data_skip=True,)
+  `
 ## Evaluation
+The evaluation scripts create files in the indicated output directory. `wer_results.txt` contains the layerwise WERs on the test sets indicated in the evaluation script. The remaining files contain the layerwise transcriptions of each item in each test set.
+### Basic evaluation
+- Normal evaluation: `eval.py path/to/model/checkpoint path/to/output/directory`
+  -   For safetensors checkpoints saved by newer versions of Hugging Face, see note in `eval.py`
+- Evaluation for models without early exits (evaluates only output of final layer): `eval_non-ee.py path/to/model/checkpoint path/to/output/directory`
 ### Results
+| Exit   | Test-Clean | Dev-Clean |
+|--------|------------|-----------|
+| Exit(1)|   19.14    |   19.06   |
+| Exit(2)|   8.26     |   8.01    |
+| Exit(3)|   5.93     |   5.57    |
+| Exit(4)|   4.74     |   4.48    |
+| Exit(5)|   3.98     |   3.79    |
+| Exit(6)|   3.95     |   3.69    |
 ## Citation