Update README.md
Browse files
README.md
CHANGED
@@ -1,50 +1,161 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
3 |
tags:
|
|
|
|
|
|
|
|
|
4 |
- generated_from_trainer
|
|
|
|
|
5 |
metrics:
|
6 |
-
-
|
7 |
-
-
|
8 |
-
-
|
9 |
-
-
|
10 |
-
|
11 |
-
-
|
12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
---
|
14 |
|
15 |
-
|
16 |
-
should probably proofread and complete it, then remove this comment. -->
|
17 |
|
18 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
|
20 |
-
This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on an unknown dataset.
|
21 |
It achieves the following results on the evaluation set:
|
22 |
- Loss: 0.6191
|
23 |
- Accuracy: 0.9103
|
24 |
- Precision: 0.9239
|
25 |
- Recall: 0.9161
|
26 |
- F1: 0.9200
|
27 |
-
- Report: precision recall f1-score support
|
28 |
-
|
29 |
-
0 0.89 0.90 0.90 463
|
30 |
-
1 0.92 0.92 0.92 596
|
31 |
|
32 |
-
|
33 |
-
macro avg 0.91 0.91 0.91 1059
|
34 |
-
weighted avg 0.91 0.91 0.91 1059
|
35 |
|
|
|
36 |
|
37 |
-
|
38 |
|
39 |
-
|
40 |
|
41 |
## Intended uses & limitations
|
42 |
|
43 |
-
|
44 |
|
45 |
## Training and evaluation data
|
46 |
|
47 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
48 |
|
49 |
## Training procedure
|
50 |
|
@@ -64,63 +175,14 @@ The following hyperparameters were used during training:
|
|
64 |
|
65 |
### Training results
|
66 |
|
67 |
-
| Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1 |
|
68 |
-
|
69 |
-
| No log | 3.36 | 500 | 0.2346 | 0.9207 | 0.9197 | 0.9413 | 0.9303 |
|
70 |
-
|
71 |
-
|
72 |
-
|
73 |
-
|
74 |
-
|
75 |
-
macro avg 0.92 0.92 0.92 1059
|
76 |
-
weighted avg 0.92 0.92 0.92 1059
|
77 |
-
|
|
78 |
-
| 0.2212 | 6.71 | 1000 | 0.3161 | 0.9046 | 0.9260 | 0.9027 | 0.9142 | precision recall f1-score support
|
79 |
-
|
80 |
-
0 0.88 0.91 0.89 463
|
81 |
-
1 0.93 0.90 0.91 596
|
82 |
-
|
83 |
-
accuracy 0.90 1059
|
84 |
-
macro avg 0.90 0.90 0.90 1059
|
85 |
-
weighted avg 0.91 0.90 0.90 1059
|
86 |
-
|
|
87 |
-
| 0.2212 | 10.07 | 1500 | 0.4337 | 0.9065 | 0.9191 | 0.9144 | 0.9167 | precision recall f1-score support
|
88 |
-
|
89 |
-
0 0.89 0.90 0.89 463
|
90 |
-
1 0.92 0.91 0.92 596
|
91 |
-
|
92 |
-
accuracy 0.91 1059
|
93 |
-
macro avg 0.90 0.91 0.91 1059
|
94 |
-
weighted avg 0.91 0.91 0.91 1059
|
95 |
-
|
|
96 |
-
| 0.0651 | 13.42 | 2000 | 0.4743 | 0.9178 | 0.9249 | 0.9295 | 0.9272 | precision recall f1-score support
|
97 |
-
|
98 |
-
0 0.91 0.90 0.91 463
|
99 |
-
1 0.92 0.93 0.93 596
|
100 |
-
|
101 |
-
accuracy 0.92 1059
|
102 |
-
macro avg 0.92 0.92 0.92 1059
|
103 |
-
weighted avg 0.92 0.92 0.92 1059
|
104 |
-
|
|
105 |
-
| 0.0651 | 16.78 | 2500 | 0.5538 | 0.9103 | 0.9196 | 0.9211 | 0.9204 | precision recall f1-score support
|
106 |
-
|
107 |
-
0 0.90 0.90 0.90 463
|
108 |
-
1 0.92 0.92 0.92 596
|
109 |
-
|
110 |
-
accuracy 0.91 1059
|
111 |
-
macro avg 0.91 0.91 0.91 1059
|
112 |
-
weighted avg 0.91 0.91 0.91 1059
|
113 |
-
|
|
114 |
-
| 0.0296 | 20.13 | 3000 | 0.6191 | 0.9103 | 0.9239 | 0.9161 | 0.9200 | precision recall f1-score support
|
115 |
-
|
116 |
-
0 0.89 0.90 0.90 463
|
117 |
-
1 0.92 0.92 0.92 596
|
118 |
-
|
119 |
-
accuracy 0.91 1059
|
120 |
-
macro avg 0.91 0.91 0.91 1059
|
121 |
-
weighted avg 0.91 0.91 0.91 1059
|
122 |
-
|
|
123 |
-
|
124 |
|
125 |
### Framework versions
|
126 |
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
language: en
|
4 |
+
datasets:
|
5 |
+
- Jzuluaga/uwb_atcc
|
6 |
tags:
|
7 |
+
- text
|
8 |
+
- sequence-classification
|
9 |
+
- en-atc
|
10 |
+
- en
|
11 |
- generated_from_trainer
|
12 |
+
- bert
|
13 |
+
- bertraffic
|
14 |
metrics:
|
15 |
+
- Precision
|
16 |
+
- Recall
|
17 |
+
- Accuracy
|
18 |
+
- F1
|
19 |
+
widget:
|
20 |
+
- text: "csa two nine six startup approved mike current qnh one zero one eight time check one seven"
|
21 |
+
- text: "swiss four eight seven november runway three one cleared for takeoff wind one three zero degrees seven knots"
|
22 |
+
- text: "lufthansa five yankee victor runway one three clear to land wind zero seven zero degrees"
|
23 |
+
- text: "austrian seven one zulu hello to you reduce one six zero knots"
|
24 |
+
- text: "sky travel one nine two approaching holding point three one ready for departure"
|
25 |
+
- name: bert-base-speaker-role-atc-en-uwb-atcc
|
26 |
+
results:
|
27 |
+
- task:
|
28 |
+
type: token-classification
|
29 |
+
name: chunking
|
30 |
+
dataset:
|
31 |
+
type: Jzuluaga/uwb_atcc
|
32 |
+
name: UWB-ATCC corpus (Air Traffic Control Communications)
|
33 |
+
config: test
|
34 |
+
split: test
|
35 |
+
metrics:
|
36 |
+
- type: F1
|
37 |
+
value: 0.87
|
38 |
+
name: TEST F1 (macro)
|
39 |
+
verified: False
|
40 |
+
- type: Accuracy
|
41 |
+
value: 0.91
|
42 |
+
name: TEST Accuracy
|
43 |
+
verified: False
|
44 |
+
- type: Precision
|
45 |
+
value: 0.86
|
46 |
+
name: TEST Precision (macro)
|
47 |
+
verified: False
|
48 |
+
- type: Recall
|
49 |
+
value: 0.88
|
50 |
+
name: TEST Recall (macro)
|
51 |
+
verified: False
|
52 |
+
- type: Jaccard Error Rate
|
53 |
+
value: 0.169
|
54 |
+
name: TEST Jaccard Error Rate
|
55 |
+
verified: False
|
56 |
---
|
57 |
|
58 |
+
# bert-base-speaker-role-atc-en-uwb-atcc
|
|
|
59 |
|
60 |
+
This model allow to detect speaker roles based on text. Normally, this task is done on the acoustic level. However, we propose to perform this task on the text level.
|
61 |
+
We solve this challenge by performing speaker role with a BERT model. We fine-tune it on the sequence classification task.
|
62 |
+
|
63 |
+
For instance:
|
64 |
+
|
65 |
+
- Utterance 1: **lufthansa six two nine charlie tango report when established**
|
66 |
+
- Utterance 2: **report when established lufthansa six two nine charlie tango**
|
67 |
+
|
68 |
+
Based on that, could you tell the speaker role? Is it Utterance 1 air traffic controller or pilot?
|
69 |
+
|
70 |
+
Check the inference API (there are 5 examples)!
|
71 |
+
|
72 |
+
This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on the [UWB-ATCC corpus](https://huggingface.co/datasets/Jzuluaga/uwb_atcc).
|
73 |
+
|
74 |
+
<a href="https://github.com/idiap/atco2-corpus">
|
75 |
+
<img alt="GitHub" src="https://img.shields.io/badge/GitHub-Open%20source-green\">
|
76 |
+
</a>
|
77 |
|
|
|
78 |
It achieves the following results on the evaluation set:
|
79 |
- Loss: 0.6191
|
80 |
- Accuracy: 0.9103
|
81 |
- Precision: 0.9239
|
82 |
- Recall: 0.9161
|
83 |
- F1: 0.9200
|
|
|
|
|
|
|
|
|
84 |
|
85 |
+
**Paper**: [ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications](https://arxiv.org/abs/2211.04054)
|
|
|
|
|
86 |
|
87 |
+
Authors: Juan Zuluaga-Gomez, Karel Veselý, Igor Szöke, Petr Motlicek, Martin Kocour, Mickael Rigault, Khalid Choukri, Amrutha Prasad and others
|
88 |
|
89 |
+
Abstract: Personal assistants, automatic speech recognizers and dialogue understanding systems are becoming more critical in our interconnected digital world. A clear example is air traffic control (ATC) communications. ATC aims at guiding aircraft and controlling the airspace in a safe and optimal manner. These voice-based dialogues are carried between an air traffic controller (ATCO) and pilots via very-high frequency radio channels. In order to incorporate these novel technologies into ATC (low-resource domain), large-scale annotated datasets are required to develop the data-driven AI systems. Two examples are automatic speech recognition (ASR) and natural language understanding (NLU). In this paper, we introduce the ATCO2 corpus, a dataset that aims at fostering research on the challenging ATC field, which has lagged behind due to lack of annotated data. The ATCO2 corpus covers 1) data collection and pre-processing, 2) pseudo-annotations of speech data, and 3) extraction of ATC-related named entities. The ATCO2 corpus is split into three subsets. 1) ATCO2-test-set corpus contains 4 hours of ATC speech with manual transcripts and a subset with gold annotations for named-entity recognition (callsign, command, value). 2) The ATCO2-PL-set corpus consists of 5281 hours of unlabeled ATC data enriched with automatic transcripts from an in-domain speech recognizer, contextual information, speaker turn information, signal-to-noise ratio estimate and English language detection score per sample. Both available for purchase through ELDA at this http URL. 3) The ATCO2-test-set-1h corpus is a one-hour subset from the original test set corpus, that we are offering for free at this url: https://www.atco2.org/data. We expect the ATCO2 corpus will foster research on robust ASR and NLU not only in the field of ATC communications but also in the general research community.
|
90 |
|
91 |
+
Code — GitHub repository: https://github.com/idiap/atco2-corpus
|
92 |
|
93 |
## Intended uses & limitations
|
94 |
|
95 |
+
This model was fine-tuned on air traffic control data. We don't expect that it keeps the same performance on some others datasets where BERT was pre-trained or fine-tuned.
|
96 |
|
97 |
## Training and evaluation data
|
98 |
|
99 |
+
See Table 7 (page 19) in our paper: [ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications](https://arxiv.org/abs/2211.04054). We described there the data used to fine-tune our sequence classification model.
|
100 |
+
|
101 |
+
- We use the UWB-ATCC corpus to fine-tune this model. You can download the raw data here: https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0001-CCA1-0
|
102 |
+
- However, do not worry, we have prepared a script in our repository for preparing this databases:
|
103 |
+
- Dataset preparation folder: https://github.com/idiap/atco2-corpus/tree/main/data/databases/uwb_atcc/
|
104 |
+
- Prepare the data: https://github.com/idiap/atco2-corpus/blob/main/data/databases/uwb_atcc/data_prepare_uwb_atcc_corpus_other.sh
|
105 |
+
|
106 |
+
## Writing your own inference script
|
107 |
+
|
108 |
+
|
109 |
+
The snippet of code:
|
110 |
+
|
111 |
+
```python
|
112 |
+
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
|
113 |
+
|
114 |
+
tokenizer = AutoTokenizer.from_pretrained("Jzuluaga/bert-base-speaker-role-atc-en-uwb-atcc")
|
115 |
+
model = AutoModelForSequenceClassification.from_pretrained("Jzuluaga/bert-base-speaker-role-atc-en-uwb-atcc")
|
116 |
+
|
117 |
+
|
118 |
+
##### Process text sample (from UWB-ATCC)
|
119 |
+
from transformers import pipeline
|
120 |
+
|
121 |
+
nlp = pipeline('text-classification', model=model, tokenizer=tokenizer)
|
122 |
+
nlp("lining up runway three one csa five bravo")
|
123 |
+
|
124 |
+
[{'label': 'pilot',
|
125 |
+
'score': 0.9998971223831177}]
|
126 |
+
|
127 |
+
```
|
128 |
+
|
129 |
+
# Cite us
|
130 |
+
|
131 |
+
If you use this code for your research, please cite our paper with:
|
132 |
+
|
133 |
+
```
|
134 |
+
@article{zuluaga2022bertraffic,
|
135 |
+
title={BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications},
|
136 |
+
author={Zuluaga-Gomez, Juan and Sarfjoo, Seyyed Saeed and Prasad, Amrutha and others},
|
137 |
+
journal={IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar},
|
138 |
+
year={2022}
|
139 |
+
}
|
140 |
+
```
|
141 |
+
and,
|
142 |
+
```
|
143 |
+
@article{zuluaga2022how,
|
144 |
+
title={How Does Pre-trained Wav2Vec2. 0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications},
|
145 |
+
author={Zuluaga-Gomez, Juan and Prasad, Amrutha and Nigmatulina, Iuliia and Sarfjoo, Saeed and others},
|
146 |
+
journal={IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar},
|
147 |
+
year={2022}
|
148 |
+
}
|
149 |
+
```
|
150 |
+
and,
|
151 |
+
```
|
152 |
+
@article{zuluaga2022atco2,
|
153 |
+
title={ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications},
|
154 |
+
author={Zuluaga-Gomez, Juan and Vesel{\`y}, Karel and Sz{\"o}ke, Igor and Motlicek, Petr and others},
|
155 |
+
journal={arXiv preprint arXiv:2211.04054},
|
156 |
+
year={2022}
|
157 |
+
}
|
158 |
+
```
|
159 |
|
160 |
## Training procedure
|
161 |
|
|
|
175 |
|
176 |
### Training results
|
177 |
|
178 |
+
| Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1 |
|
179 |
+
|:-------------:|:-----:|:----:|:---------------:|:--------:|:---------:|:------:|:------:|
|
180 |
+
| No log | 3.36 | 500 | 0.2346 | 0.9207 | 0.9197 | 0.9413 | 0.9303 |
|
181 |
+
| 0.2212 | 6.71 | 1000 | 0.3161 | 0.9046 | 0.9260 | 0.9027 | 0.9142 |
|
182 |
+
| 0.2212 | 10.07 | 1500 | 0.4337 | 0.9065 | 0.9191 | 0.9144 | 0.9167 |
|
183 |
+
| 0.0651 | 13.42 | 2000 | 0.4743 | 0.9178 | 0.9249 | 0.9295 | 0.9272 |
|
184 |
+
| 0.0651 | 16.78 | 2500 | 0.5538 | 0.9103 | 0.9196 | 0.9211 | 0.9204 |
|
185 |
+
| 0.0296 | 20.13 | 3000 | 0.6191 | 0.9103 | 0.9239 | 0.9161 | 0.9200 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
186 |
|
187 |
### Framework versions
|
188 |
|