olivierb commited on
Commit
9d7d11c
1 Parent(s): 1afe09d

add model card

Browse files
Files changed (1) hide show
  1. README.md +75 -0
README.md ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - "fr"
4
+ tags:
5
+ - "audio"
6
+ - "speech"
7
+ - "automatic-speech-recognition"
8
+ - "medkit"
9
+ - "speechbrain"
10
+ datasets:
11
+ - "common_voice"
12
+ - "pxcorpus"
13
+ - "simsamu"
14
+ metrics:
15
+ - "wer"
16
+ ---
17
+
18
+ # Simsamu transcription model
19
+
20
+ This repository contains a pretrained
21
+ [speechbrain](https://github.com/speechbrain/speechbrain) transcription model
22
+ for the french language that was fine-tuned on the
23
+ [Simsamu](https://huggingface.co/datasets/medkit/simsamu) dataset.
24
+
25
+ The model is a CTC-based model on top of
26
+ [wav2vec2](https://huggingface.co/LeBenchmark/wav2vec2-FR-7K-large) embeddings,
27
+ trained on data from the [CommonVoice](https://commonvoice.mozilla.org),
28
+ [PxCorpus](https://zenodo.org/records/6482587) and Simsmu datasets. The CTC
29
+ layers were trained from scratch and the wav2vec2 layers were fine-tuned.
30
+
31
+ The model can be used in [medkit](https://github.com/medkit-lib/medkit/) the
32
+ following way:
33
+
34
+ ```
35
+ from medkit.core.audio import AudioDocument
36
+ from medkit.audio.segmentation.pa_speaker_detector import PASpeakerDetector
37
+ from medkit.audio.transcription.sb_transcriber import SBTranscriber
38
+
39
+ # init speaker detector operation
40
+ speaker_detector = PASpeakerDetector(
41
+ model="medkit/simsamu-diarization",
42
+ device=0,
43
+ segmentation_batch_size=10,
44
+ embedding_batch_size=10,
45
+ )
46
+
47
+ # init transcriber operation
48
+ transcriber = SBTranscriber(
49
+ model="medkit/simsamu-transcription",
50
+ needs_decoder=False,
51
+ output_label="transcription",
52
+ device=0,
53
+ batch_size=10,
54
+ )
55
+
56
+ # create audio document
57
+ audio_doc = AudioDocument.from_file("path/to/audio.wav")
58
+
59
+ # apply speaker detector operation on audio document
60
+ # to get speech segments
61
+ speech_segments = speaker_detector.run([audio_doc.raw_segment])
62
+
63
+ # apply transcriber operation on speech segments
64
+ transcriber.run(speech_segments)
65
+
66
+ # display transcription for each speech turn
67
+ for speech_seg in speech_segments:
68
+ transcription_attr = speech_seg.attrs.get(label="transcription")[0]
69
+ print(speech_seg.span.start, speech_seg.span.end, transcription_attr.value)
70
+ ```
71
+
72
+ More info at https://medkit.readthedocs.io/
73
+
74
+ See also: [Simsamu diarization
75
+ pipeline](https://huggingface.co/medkit/simsamu-diarization)