Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,30 @@
|
|
1 |
---
|
|
|
2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
language: fr
|
3 |
license: mit
|
4 |
+
tags:
|
5 |
+
- bert
|
6 |
+
- language-model
|
7 |
+
- flaubert
|
8 |
+
- french
|
9 |
+
- flaubert-base
|
10 |
+
- uncased
|
11 |
+
- asr
|
12 |
+
- speech
|
13 |
+
- oral
|
14 |
+
- natural language understanding
|
15 |
+
- NLU
|
16 |
+
- spoken language understanding
|
17 |
+
- SLU
|
18 |
+
- understanding
|
19 |
---
|
20 |
+
|
21 |
+
# FlauBERT-Oral models: Using ASR-Generated Text for Spoken Language Modeling
|
22 |
+
|
23 |
+
**FlauBERT-Oral** are French BERT models trained on a very large amount of automatically transcribed speech from 350,000 hours of diverse French TV shows. They were trained with the [**FlauBERT software**](https://github.com/getalp/Flaubert) using the same parameters as the [flaubert-base-uncased](https://huggingface.co/flaubert/flaubert_base_uncased) model (12 layers, 12 attention heads, 768 dims, 137M parameters, uncased).
|
24 |
+
|
25 |
+
## Available FlauBERT-Oral models
|
26 |
+
|
27 |
+
- `flaubert-oral-asr` : trained from scratch on ASR data, keeping the BPE tokenizer and vocabulary of flaubert-base-uncased
|
28 |
+
- `flaubert-oral-asr_nb` : trained from scratch on ASR data, BPE tokenizer is also trained on the same corpus
|
29 |
+
- `flaubert-oral-mixed` : trained from scratch on a mixed corpus of ASR and text data, BPE tokenizer is also trained on the same corpus
|
30 |
+
|