medkit
/

simsamu-transcription

Automatic Speech Recognition

Model card Files Files and versions Community

olivierb commited on Nov 30, 2023

Commit

9d7d11c

•

1 Parent(s): 1afe09d

add model card

Files changed (1) hide show

README.md +75 -0

README.md ADDED Viewed

	@@ -0,0 +1,75 @@

+---
+language:
+- "fr"
+tags:
+- "audio"
+- "speech"
+- "automatic-speech-recognition"
+- "medkit"
+- "speechbrain"
+datasets:
+- "common_voice"
+- "pxcorpus"
+- "simsamu"
+metrics:
+- "wer"
+---
+# Simsamu transcription model
+This repository contains a pretrained
+[speechbrain](https://github.com/speechbrain/speechbrain) transcription model
+for the french language that was fine-tuned on the
+[Simsamu](https://huggingface.co/datasets/medkit/simsamu) dataset.
+The model is a CTC-based model on top of
+[wav2vec2](https://huggingface.co/LeBenchmark/wav2vec2-FR-7K-large) embeddings,
+trained on data from the [CommonVoice](https://commonvoice.mozilla.org),
+[PxCorpus](https://zenodo.org/records/6482587) and Simsmu datasets. The CTC
+layers were trained from scratch and the wav2vec2 layers were fine-tuned.
+The model can be used in [medkit](https://github.com/medkit-lib/medkit/) the
+following way:
+```
+from medkit.core.audio import AudioDocument
+from medkit.audio.segmentation.pa_speaker_detector import PASpeakerDetector
+from medkit.audio.transcription.sb_transcriber import SBTranscriber
+# init speaker detector operation
+speaker_detector = PASpeakerDetector(
+    model="medkit/simsamu-diarization",
+    device=0,
+    segmentation_batch_size=10,
+    embedding_batch_size=10,
+)
+# init transcriber operation
+transcriber = SBTranscriber(
+    model="medkit/simsamu-transcription",
+    needs_decoder=False,
+    output_label="transcription",
+    device=0,
+    batch_size=10,
+)
+# create audio document
+audio_doc = AudioDocument.from_file("path/to/audio.wav")
+# apply speaker detector operation on audio document
+# to get speech segments
+speech_segments = speaker_detector.run([audio_doc.raw_segment])
+# apply transcriber operation on speech segments
+transcriber.run(speech_segments)
+# display transcription for each speech turn
+for speech_seg in speech_segments:
+    transcription_attr = speech_seg.attrs.get(label="transcription")[0]
+    print(speech_seg.span.start, speech_seg.span.end, transcription_attr.value)
+```
+More info at https://medkit.readthedocs.io/
+See also: [Simsamu diarization
+pipeline](https://huggingface.co/medkit/simsamu-diarization)