olivierb
commited on
Commit
•
9d7d11c
1
Parent(s):
1afe09d
add model card
Browse files
README.md
ADDED
@@ -0,0 +1,75 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- "fr"
|
4 |
+
tags:
|
5 |
+
- "audio"
|
6 |
+
- "speech"
|
7 |
+
- "automatic-speech-recognition"
|
8 |
+
- "medkit"
|
9 |
+
- "speechbrain"
|
10 |
+
datasets:
|
11 |
+
- "common_voice"
|
12 |
+
- "pxcorpus"
|
13 |
+
- "simsamu"
|
14 |
+
metrics:
|
15 |
+
- "wer"
|
16 |
+
---
|
17 |
+
|
18 |
+
# Simsamu transcription model
|
19 |
+
|
20 |
+
This repository contains a pretrained
|
21 |
+
[speechbrain](https://github.com/speechbrain/speechbrain) transcription model
|
22 |
+
for the french language that was fine-tuned on the
|
23 |
+
[Simsamu](https://huggingface.co/datasets/medkit/simsamu) dataset.
|
24 |
+
|
25 |
+
The model is a CTC-based model on top of
|
26 |
+
[wav2vec2](https://huggingface.co/LeBenchmark/wav2vec2-FR-7K-large) embeddings,
|
27 |
+
trained on data from the [CommonVoice](https://commonvoice.mozilla.org),
|
28 |
+
[PxCorpus](https://zenodo.org/records/6482587) and Simsmu datasets. The CTC
|
29 |
+
layers were trained from scratch and the wav2vec2 layers were fine-tuned.
|
30 |
+
|
31 |
+
The model can be used in [medkit](https://github.com/medkit-lib/medkit/) the
|
32 |
+
following way:
|
33 |
+
|
34 |
+
```
|
35 |
+
from medkit.core.audio import AudioDocument
|
36 |
+
from medkit.audio.segmentation.pa_speaker_detector import PASpeakerDetector
|
37 |
+
from medkit.audio.transcription.sb_transcriber import SBTranscriber
|
38 |
+
|
39 |
+
# init speaker detector operation
|
40 |
+
speaker_detector = PASpeakerDetector(
|
41 |
+
model="medkit/simsamu-diarization",
|
42 |
+
device=0,
|
43 |
+
segmentation_batch_size=10,
|
44 |
+
embedding_batch_size=10,
|
45 |
+
)
|
46 |
+
|
47 |
+
# init transcriber operation
|
48 |
+
transcriber = SBTranscriber(
|
49 |
+
model="medkit/simsamu-transcription",
|
50 |
+
needs_decoder=False,
|
51 |
+
output_label="transcription",
|
52 |
+
device=0,
|
53 |
+
batch_size=10,
|
54 |
+
)
|
55 |
+
|
56 |
+
# create audio document
|
57 |
+
audio_doc = AudioDocument.from_file("path/to/audio.wav")
|
58 |
+
|
59 |
+
# apply speaker detector operation on audio document
|
60 |
+
# to get speech segments
|
61 |
+
speech_segments = speaker_detector.run([audio_doc.raw_segment])
|
62 |
+
|
63 |
+
# apply transcriber operation on speech segments
|
64 |
+
transcriber.run(speech_segments)
|
65 |
+
|
66 |
+
# display transcription for each speech turn
|
67 |
+
for speech_seg in speech_segments:
|
68 |
+
transcription_attr = speech_seg.attrs.get(label="transcription")[0]
|
69 |
+
print(speech_seg.span.start, speech_seg.span.end, transcription_attr.value)
|
70 |
+
```
|
71 |
+
|
72 |
+
More info at https://medkit.readthedocs.io/
|
73 |
+
|
74 |
+
See also: [Simsamu diarization
|
75 |
+
pipeline](https://huggingface.co/medkit/simsamu-diarization)
|