nherve's picture
Update README.md
f96ac31
|
raw
history blame
1.17 kB
metadata
language: fr
license: mit
tags:
  - bert
  - language-model
  - flaubert
  - french
  - flaubert-base
  - uncased
  - asr
  - speech
  - oral
  - natural language understanding
  - NLU
  - spoken language understanding
  - SLU
  - understanding

FlauBERT-Oral models: Using ASR-Generated Text for Spoken Language Modeling

FlauBERT-Oral are French BERT models trained on a very large amount of automatically transcribed speech from 350,000 hours of diverse French TV shows. They were trained with the [FlauBERT software] (https://github.com/getalp/Flaubert) using the same parameters as the flaubert-base-uncased model (12 layers, 12 attention heads, 768 dims, 137M parameters, uncased).

Available FlauBERT-Oral models

flaubert-oral-asr : trained from scratch on ASR data, keeping the BPE tokenizer and vocabulary of flaubert-base-uncased flaubert-oral-asr_nb : trained from scratch on ASR data, BPE tokenizer is also trained on the same corpus flaubert-oral-mixed : trained from scratch on a mixed corpus of ASR and text data, BPE tokenizer is also trained on the same corpus