Jzuluaga commited on
Commit
ffcf02b
1 Parent(s): a9b258a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +141 -79
README.md CHANGED
@@ -1,50 +1,161 @@
1
  ---
2
  license: apache-2.0
 
 
 
3
  tags:
 
 
 
 
4
  - generated_from_trainer
 
 
5
  metrics:
6
- - accuracy
7
- - precision
8
- - recall
9
- - f1
10
- model-index:
11
- - name: uwb_atcc
12
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ---
14
 
15
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
- should probably proofread and complete it, then remove this comment. -->
17
 
18
- # uwb_atcc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
- This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on an unknown dataset.
21
  It achieves the following results on the evaluation set:
22
  - Loss: 0.6191
23
  - Accuracy: 0.9103
24
  - Precision: 0.9239
25
  - Recall: 0.9161
26
  - F1: 0.9200
27
- - Report: precision recall f1-score support
28
-
29
- 0 0.89 0.90 0.90 463
30
- 1 0.92 0.92 0.92 596
31
 
32
- accuracy 0.91 1059
33
- macro avg 0.91 0.91 0.91 1059
34
- weighted avg 0.91 0.91 0.91 1059
35
 
 
36
 
37
- ## Model description
38
 
39
- More information needed
40
 
41
  ## Intended uses & limitations
42
 
43
- More information needed
44
 
45
  ## Training and evaluation data
46
 
47
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
 
49
  ## Training procedure
50
 
@@ -64,63 +175,14 @@ The following hyperparameters were used during training:
64
 
65
  ### Training results
66
 
67
- | Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1 | Report |
68
- |:-------------:|:-----:|:----:|:---------------:|:--------:|:---------:|:------:|:------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
69
- | No log | 3.36 | 500 | 0.2346 | 0.9207 | 0.9197 | 0.9413 | 0.9303 | precision recall f1-score support
70
-
71
- 0 0.92 0.89 0.91 463
72
- 1 0.92 0.94 0.93 596
73
-
74
- accuracy 0.92 1059
75
- macro avg 0.92 0.92 0.92 1059
76
- weighted avg 0.92 0.92 0.92 1059
77
- |
78
- | 0.2212 | 6.71 | 1000 | 0.3161 | 0.9046 | 0.9260 | 0.9027 | 0.9142 | precision recall f1-score support
79
-
80
- 0 0.88 0.91 0.89 463
81
- 1 0.93 0.90 0.91 596
82
-
83
- accuracy 0.90 1059
84
- macro avg 0.90 0.90 0.90 1059
85
- weighted avg 0.91 0.90 0.90 1059
86
- |
87
- | 0.2212 | 10.07 | 1500 | 0.4337 | 0.9065 | 0.9191 | 0.9144 | 0.9167 | precision recall f1-score support
88
-
89
- 0 0.89 0.90 0.89 463
90
- 1 0.92 0.91 0.92 596
91
-
92
- accuracy 0.91 1059
93
- macro avg 0.90 0.91 0.91 1059
94
- weighted avg 0.91 0.91 0.91 1059
95
- |
96
- | 0.0651 | 13.42 | 2000 | 0.4743 | 0.9178 | 0.9249 | 0.9295 | 0.9272 | precision recall f1-score support
97
-
98
- 0 0.91 0.90 0.91 463
99
- 1 0.92 0.93 0.93 596
100
-
101
- accuracy 0.92 1059
102
- macro avg 0.92 0.92 0.92 1059
103
- weighted avg 0.92 0.92 0.92 1059
104
- |
105
- | 0.0651 | 16.78 | 2500 | 0.5538 | 0.9103 | 0.9196 | 0.9211 | 0.9204 | precision recall f1-score support
106
-
107
- 0 0.90 0.90 0.90 463
108
- 1 0.92 0.92 0.92 596
109
-
110
- accuracy 0.91 1059
111
- macro avg 0.91 0.91 0.91 1059
112
- weighted avg 0.91 0.91 0.91 1059
113
- |
114
- | 0.0296 | 20.13 | 3000 | 0.6191 | 0.9103 | 0.9239 | 0.9161 | 0.9200 | precision recall f1-score support
115
-
116
- 0 0.89 0.90 0.90 463
117
- 1 0.92 0.92 0.92 596
118
-
119
- accuracy 0.91 1059
120
- macro avg 0.91 0.91 0.91 1059
121
- weighted avg 0.91 0.91 0.91 1059
122
- |
123
-
124
 
125
  ### Framework versions
126
 
 
1
  ---
2
  license: apache-2.0
3
+ language: en
4
+ datasets:
5
+ - Jzuluaga/uwb_atcc
6
  tags:
7
+ - text
8
+ - sequence-classification
9
+ - en-atc
10
+ - en
11
  - generated_from_trainer
12
+ - bert
13
+ - bertraffic
14
  metrics:
15
+ - Precision
16
+ - Recall
17
+ - Accuracy
18
+ - F1
19
+ widget:
20
+ - text: "csa two nine six startup approved mike current qnh one zero one eight time check one seven"
21
+ - text: "swiss four eight seven november runway three one cleared for takeoff wind one three zero degrees seven knots"
22
+ - text: "lufthansa five yankee victor runway one three clear to land wind zero seven zero degrees"
23
+ - text: "austrian seven one zulu hello to you reduce one six zero knots"
24
+ - text: "sky travel one nine two approaching holding point three one ready for departure"
25
+ - name: bert-base-speaker-role-atc-en-uwb-atcc
26
+ results:
27
+ - task:
28
+ type: token-classification
29
+ name: chunking
30
+ dataset:
31
+ type: Jzuluaga/uwb_atcc
32
+ name: UWB-ATCC corpus (Air Traffic Control Communications)
33
+ config: test
34
+ split: test
35
+ metrics:
36
+ - type: F1
37
+ value: 0.87
38
+ name: TEST F1 (macro)
39
+ verified: False
40
+ - type: Accuracy
41
+ value: 0.91
42
+ name: TEST Accuracy
43
+ verified: False
44
+ - type: Precision
45
+ value: 0.86
46
+ name: TEST Precision (macro)
47
+ verified: False
48
+ - type: Recall
49
+ value: 0.88
50
+ name: TEST Recall (macro)
51
+ verified: False
52
+ - type: Jaccard Error Rate
53
+ value: 0.169
54
+ name: TEST Jaccard Error Rate
55
+ verified: False
56
  ---
57
 
58
+ # bert-base-speaker-role-atc-en-uwb-atcc
 
59
 
60
+ This model allow to detect speaker roles based on text. Normally, this task is done on the acoustic level. However, we propose to perform this task on the text level.
61
+ We solve this challenge by performing speaker role with a BERT model. We fine-tune it on the sequence classification task.
62
+
63
+ For instance:
64
+
65
+ - Utterance 1: **lufthansa six two nine charlie tango report when established**
66
+ - Utterance 2: **report when established lufthansa six two nine charlie tango**
67
+
68
+ Based on that, could you tell the speaker role? Is it Utterance 1 air traffic controller or pilot?
69
+
70
+ Check the inference API (there are 5 examples)!
71
+
72
+ This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on the [UWB-ATCC corpus](https://huggingface.co/datasets/Jzuluaga/uwb_atcc).
73
+
74
+ <a href="https://github.com/idiap/atco2-corpus">
75
+ <img alt="GitHub" src="https://img.shields.io/badge/GitHub-Open%20source-green\">
76
+ </a>
77
 
 
78
  It achieves the following results on the evaluation set:
79
  - Loss: 0.6191
80
  - Accuracy: 0.9103
81
  - Precision: 0.9239
82
  - Recall: 0.9161
83
  - F1: 0.9200
 
 
 
 
84
 
85
+ **Paper**: [ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications](https://arxiv.org/abs/2211.04054)
 
 
86
 
87
+ Authors: Juan Zuluaga-Gomez, Karel Veselý, Igor Szöke, Petr Motlicek, Martin Kocour, Mickael Rigault, Khalid Choukri, Amrutha Prasad and others
88
 
89
+ Abstract: Personal assistants, automatic speech recognizers and dialogue understanding systems are becoming more critical in our interconnected digital world. A clear example is air traffic control (ATC) communications. ATC aims at guiding aircraft and controlling the airspace in a safe and optimal manner. These voice-based dialogues are carried between an air traffic controller (ATCO) and pilots via very-high frequency radio channels. In order to incorporate these novel technologies into ATC (low-resource domain), large-scale annotated datasets are required to develop the data-driven AI systems. Two examples are automatic speech recognition (ASR) and natural language understanding (NLU). In this paper, we introduce the ATCO2 corpus, a dataset that aims at fostering research on the challenging ATC field, which has lagged behind due to lack of annotated data. The ATCO2 corpus covers 1) data collection and pre-processing, 2) pseudo-annotations of speech data, and 3) extraction of ATC-related named entities. The ATCO2 corpus is split into three subsets. 1) ATCO2-test-set corpus contains 4 hours of ATC speech with manual transcripts and a subset with gold annotations for named-entity recognition (callsign, command, value). 2) The ATCO2-PL-set corpus consists of 5281 hours of unlabeled ATC data enriched with automatic transcripts from an in-domain speech recognizer, contextual information, speaker turn information, signal-to-noise ratio estimate and English language detection score per sample. Both available for purchase through ELDA at this http URL. 3) The ATCO2-test-set-1h corpus is a one-hour subset from the original test set corpus, that we are offering for free at this url: https://www.atco2.org/data. We expect the ATCO2 corpus will foster research on robust ASR and NLU not only in the field of ATC communications but also in the general research community.
90
 
91
+ Code GitHub repository: https://github.com/idiap/atco2-corpus
92
 
93
  ## Intended uses & limitations
94
 
95
+ This model was fine-tuned on air traffic control data. We don't expect that it keeps the same performance on some others datasets where BERT was pre-trained or fine-tuned.
96
 
97
  ## Training and evaluation data
98
 
99
+ See Table 7 (page 19) in our paper: [ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications](https://arxiv.org/abs/2211.04054). We described there the data used to fine-tune our sequence classification model.
100
+
101
+ - We use the UWB-ATCC corpus to fine-tune this model. You can download the raw data here: https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0001-CCA1-0
102
+ - However, do not worry, we have prepared a script in our repository for preparing this databases:
103
+ - Dataset preparation folder: https://github.com/idiap/atco2-corpus/tree/main/data/databases/uwb_atcc/
104
+ - Prepare the data: https://github.com/idiap/atco2-corpus/blob/main/data/databases/uwb_atcc/data_prepare_uwb_atcc_corpus_other.sh
105
+
106
+ ## Writing your own inference script
107
+
108
+
109
+ The snippet of code:
110
+
111
+ ```python
112
+ from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
113
+
114
+ tokenizer = AutoTokenizer.from_pretrained("Jzuluaga/bert-base-speaker-role-atc-en-uwb-atcc")
115
+ model = AutoModelForSequenceClassification.from_pretrained("Jzuluaga/bert-base-speaker-role-atc-en-uwb-atcc")
116
+
117
+
118
+ ##### Process text sample (from UWB-ATCC)
119
+ from transformers import pipeline
120
+
121
+ nlp = pipeline('text-classification', model=model, tokenizer=tokenizer)
122
+ nlp("lining up runway three one csa five bravo")
123
+
124
+ [{'label': 'pilot',
125
+ 'score': 0.9998971223831177}]
126
+
127
+ ```
128
+
129
+ # Cite us
130
+
131
+ If you use this code for your research, please cite our paper with:
132
+
133
+ ```
134
+ @article{zuluaga2022bertraffic,
135
+ title={BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications},
136
+ author={Zuluaga-Gomez, Juan and Sarfjoo, Seyyed Saeed and Prasad, Amrutha and others},
137
+ journal={IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar},
138
+ year={2022}
139
+ }
140
+ ```
141
+ and,
142
+ ```
143
+ @article{zuluaga2022how,
144
+ title={How Does Pre-trained Wav2Vec2. 0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications},
145
+ author={Zuluaga-Gomez, Juan and Prasad, Amrutha and Nigmatulina, Iuliia and Sarfjoo, Saeed and others},
146
+ journal={IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar},
147
+ year={2022}
148
+ }
149
+ ```
150
+ and,
151
+ ```
152
+ @article{zuluaga2022atco2,
153
+ title={ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications},
154
+ author={Zuluaga-Gomez, Juan and Vesel{\`y}, Karel and Sz{\"o}ke, Igor and Motlicek, Petr and others},
155
+ journal={arXiv preprint arXiv:2211.04054},
156
+ year={2022}
157
+ }
158
+ ```
159
 
160
  ## Training procedure
161
 
 
175
 
176
  ### Training results
177
 
178
+ | Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1 |
179
+ |:-------------:|:-----:|:----:|:---------------:|:--------:|:---------:|:------:|:------:|
180
+ | No log | 3.36 | 500 | 0.2346 | 0.9207 | 0.9197 | 0.9413 | 0.9303 |
181
+ | 0.2212 | 6.71 | 1000 | 0.3161 | 0.9046 | 0.9260 | 0.9027 | 0.9142 |
182
+ | 0.2212 | 10.07 | 1500 | 0.4337 | 0.9065 | 0.9191 | 0.9144 | 0.9167 |
183
+ | 0.0651 | 13.42 | 2000 | 0.4743 | 0.9178 | 0.9249 | 0.9295 | 0.9272 |
184
+ | 0.0651 | 16.78 | 2500 | 0.5538 | 0.9103 | 0.9196 | 0.9211 | 0.9204 |
185
+ | 0.0296 | 20.13 | 3000 | 0.6191 | 0.9103 | 0.9239 | 0.9161 | 0.9200 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
186
 
187
  ### Framework versions
188