baileyarzate commited on
Commit
bc68b74
1 Parent(s): 3a84968

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +93 -45
README.md CHANGED
@@ -17,21 +17,17 @@ tags: []
17
 
18
  This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
 
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
  - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
 
28
  ### Model Sources [optional]
29
 
30
  <!-- Provide the basic links for the model. -->
31
 
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
 
36
  ## Uses
37
 
@@ -69,46 +65,110 @@ Users (both direct and downstream) should be made aware of the risks, biases and
69
 
70
  ## How to Get Started with the Model
71
 
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
 
76
  ## Training Details
77
 
78
  ### Training Data
79
 
80
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 
 
 
81
 
82
  [More Information Needed]
83
 
84
  ### Training Procedure
85
 
86
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 
87
 
88
  #### Preprocessing [optional]
89
 
90
- [More Information Needed]
91
 
92
 
93
  #### Training Hyperparameters
94
 
95
  - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97
  #### Speeds, Sizes, Times [optional]
98
 
99
  <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
 
100
 
101
- [More Information Needed]
102
 
103
  ## Evaluation
104
 
105
  <!-- This section describes the evaluation protocols and provides the results. -->
 
106
 
107
  ### Testing Data, Factors & Metrics
108
 
109
  #### Testing Data
110
 
111
  <!-- This should link to a Dataset Card if possible. -->
 
 
 
112
 
113
  [More Information Needed]
114
 
@@ -122,33 +182,30 @@ Use the code below to get started with the model.
122
 
123
  <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
 
125
- [More Information Needed]
 
126
 
127
  ### Results
128
 
129
- [More Information Needed]
 
130
 
131
  #### Summary
132
 
 
133
 
134
 
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
  ## Environmental Impact
142
 
143
  <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
 
145
  Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
 
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
 
153
  ## Technical Specifications [optional]
154
 
@@ -162,15 +219,20 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
162
 
163
  #### Hardware
164
 
165
- [More Information Needed]
 
 
 
166
 
167
  #### Software
168
 
169
- [More Information Needed]
 
170
 
171
  ## Citation [optional]
172
 
173
  <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 
174
 
175
  **BibTeX:**
176
 
@@ -180,20 +242,6 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
180
 
181
  [More Information Needed]
182
 
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
  ## Model Card Contact
198
 
199
- [More Information Needed]
 
17
 
18
  This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
 
20
+ - **Developed by:** Jesse Arzate
21
+ - **Model type:** Sequence-to-Sequence (Seq2Seq) Transformer-based model
22
+ - **Language(s) (NLP):** English
 
 
23
  - **License:** [More Information Needed]
24
+ - **Finetuned from model [optional]:** Whisper ASR: distil-large-v3
25
 
26
  ### Model Sources [optional]
27
 
28
  <!-- Provide the basic links for the model. -->
29
 
30
+ - **Repository:** https://github.com/Vaibhavs10/fast-whisper-finetuning
 
 
31
 
32
  ## Uses
33
 
 
65
 
66
  ## How to Get Started with the Model
67
 
68
+ Use the code below to get started with the model.
69
+ ```python
70
+ from transformers import (
71
+ AutomaticSpeechRecognitionPipeline,
72
+ WhisperForConditionalGeneration,
73
+ WhisperTokenizer,
74
+ WhisperProcessor,
75
+ )
76
+ from peft import PeftModel, PeftConfig
77
+
78
+
79
+ peft_model_id = "baileyarzate/whisper-distil-large-v3-atc-english" # huggingface model path
80
+ language = "en"
81
+ task = "transcribe"
82
+ device = 'cuda'
83
+ peft_config = PeftConfig.from_pretrained(peft_model_id)
84
+ model = WhisperForConditionalGeneration.from_pretrained(
85
+ peft_config.base_model_name_or_path, device_map="cuda"
86
+ ).to(device)
87
+
88
+ model = PeftModel.from_pretrained(model, peft_model_id).to(device)
89
+ tokenizer = WhisperTokenizer.from_pretrained(peft_config.base_model_name_or_path, language=language, task=task)
90
+ processor = WhisperProcessor.from_pretrained(peft_config.base_model_name_or_path, language=language, task=task)
91
+ feature_extractor = processor.feature_extractor
92
+ forced_decoder_ids = processor.get_decoder_prompt_ids(language=language, task=task)
93
+ pipe = AutomaticSpeechRecognitionPipeline(model=model, tokenizer=tokenizer, feature_extractor=feature_extractor)
94
+ model.config.use_cache = True
95
+
96
+ def transcribe(audio):
97
+ with torch.cuda.amp.autocast():
98
+ text = pipe(audio, generate_kwargs={"forced_decoder_ids": forced_decoder_ids}, max_new_tokens=255)["text"]
99
+ return text
100
+
101
+ transcriptions_finetuned = []
102
+ for i in tqdm(range(len(df_subset))):
103
+ # When you only have audio file path
104
+ #transcriptions_finetuned.append(transcribe(librosa.load(df["path"][i], sr = 16000, offset = df["start"][i], duration = df["stop"][i] - df["start"][i])[0])) #,model
105
+ # When you have audio array, saves time
106
+ transcriptions_finetuned.append(transcribe(df_subset['array'].iloc[i]))
107
+ transcriptions_finetuned = pd.DataFrame(transcriptions_finetuned, columns=['transcription_finetuned'])
108
+ df_subset = df_subset.reset_index().drop(columns=['index'])
109
+ df_subset = pd.concat([df_subset, transcriptions_finetuned], axis=1)
110
+ ```
111
 
112
  ## Training Details
113
 
114
  ### Training Data
115
 
116
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
117
+ Dataset: ATC audio recordings from actual flight operations.
118
+ Size: ~250 hours of annotated data.
119
+
120
 
121
  [More Information Needed]
122
 
123
  ### Training Procedure
124
 
125
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
126
+ Modeled the procedure after: https://github.com/Vaibhavs10/fast-whisper-finetuning
127
 
128
  #### Preprocessing [optional]
129
 
130
+ Preprocessing: Striped leading and trailing whitespaces from transcript sentences. Removed any sentences containing the phrase "UNINTELLIGIBLE" to filter out unclear or garbled speech. Removed filler words such as "ah" or "uh".
131
 
132
 
133
  #### Training Hyperparameters
134
 
135
  - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
136
+ ```python
137
+ training_args = Seq2SeqTrainingArguments(
138
+ per_device_train_batch_size=4,
139
+ gradient_accumulation_steps=2,
140
+ learning_rate=5e-4,
141
+ warmup_steps=100,
142
+ num_train_epochs=3,
143
+ fp16=True,
144
+ per_device_eval_batch_size=4,
145
+ generation_max_length=128,
146
+ logging_steps=100,
147
+ save_steps=500,
148
+ save_total_limit=3,
149
+ remove_unused_columns=False, # required as the PeftModel forward doesn't have the signature of the wrapped model's forward
150
+ label_names=["labels"], # same reason as above
151
+ )
152
+ ```
153
  #### Speeds, Sizes, Times [optional]
154
 
155
  <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
156
+ Inference time is about 2 samples per second with an RTX A2000.
157
 
 
158
 
159
  ## Evaluation
160
 
161
  <!-- This section describes the evaluation protocols and provides the results. -->
162
+ Final training loss: 0.103
163
 
164
  ### Testing Data, Factors & Metrics
165
 
166
  #### Testing Data
167
 
168
  <!-- This should link to a Dataset Card if possible. -->
169
+ Dataset: ATC audio recordings from actual flight operations.
170
+ Size: ~250 hours of annotated data.
171
+ Randomly sampled 20% of the data with seed = 42.
172
 
173
  [More Information Needed]
174
 
 
182
 
183
  <!-- These are the evaluation metrics being used, ideally with a description of why. -->
184
 
185
+ Word Error Rate
186
+ Normalized Word Error Rate
187
 
188
  ### Results
189
 
190
+ Mean WER for 500 test samples: 0.145
191
+ with 95% confidence interval: (0.123, 0.167)
192
 
193
  #### Summary
194
 
195
+ [IN PROGRESS]
196
 
197
 
 
 
 
 
 
 
198
  ## Environmental Impact
199
 
200
  <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
201
 
202
  Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
203
 
204
+ - **Hardware Type:** RTX A2000
205
+ - **Hours used:** 24
206
+ - **Cloud Provider:** Private Infrustructure
207
+ - **Compute Region:** Southern California
208
+ - **Carbon Emitted:** 1.57 kg
209
 
210
  ## Technical Specifications [optional]
211
 
 
219
 
220
  #### Hardware
221
 
222
+ CPU: AMD EPYC 7313P 16-Core Processor 3.00 GHz
223
+ GPU: NVIDIA RTX A2000
224
+ vRAM: 6GB
225
+ RAM: 128GB
226
 
227
  #### Software
228
 
229
+ Windows 11 Enterprise - 21H2
230
+ Python 3.10.14
231
 
232
  ## Citation [optional]
233
 
234
  <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
235
+ [IN PROGRESS]
236
 
237
  **BibTeX:**
238
 
 
242
 
243
  [More Information Needed]
244
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
245
  ## Model Card Contact
246
 
247
+ Jesse Arzate: baileyarzate@gmail.com