--- library_name: transformers license: mit language: fr datasets: - Cnam-LMSSC/vibravox metrics: - per tags: - audio - automatic-speech-recognition - speech - phonemize - phoneme model-index: - name: Wav2Vec2-base French finetuned for Speech-to-Phoneme by LMSSC results: - task: name: Speech-to-Phoneme type: automatic-speech-recognition dataset: name: Vibravox["airborne.mouth_headworn.reference_microphone"] type: Cnam-LMSSC/vibravox args: fr metrics: - name: Test PER on Vibravox["airborne.mouth_headworn.reference_microphone"] | Trained type: per value: 2.874 --- # Model Card - **Developed by:** [Cnam-LMSSC](https://huggingface.co/Cnam-LMSSC) - **Model type:** [Wav2Vec2ForCTC](https://huggingface.co/transformers/v4.9.2/model_doc/wav2vec2.html#transformers.Wav2Vec2ForCTC) - **Language:** French - **License:** MIT - **Finetuned from model:** [facebook/wav2vec2-base-fr-voxpopuli-v2](https://huggingface.co/facebook/wav2vec2-base-fr-voxpopuli-v2) - **Finetuned dataset:** `airborne.mouth_headworn.reference_microphone` audio of the `speech_clean` subset of [Cnam-LMSSC/vibravox](https://huggingface.co/datasets/Cnam-LMSSC/vibravox) - **Samplerate for usage:** 16kHz

## Output As this model is specifically trained for a speech-to-phoneme task, the output is sequence of [IPA-encoded](https://en.wikipedia.org/wiki/International_Phonetic_Alphabet) words, without punctuation. If you don't read the phonetic alphabet fluently, you can use this excellent [IPA reader website](http://ipa-reader.xyz) to convert the transcript back to audio synthetic speech in order to check the quality of the phonetic transcription. ## Link to other phonemizer models trained on other body conducted sensors : An entry point to all **phonemizers** (speech-to-phoneme ASR) models trained on different sensor data from the trained on different sensor data from the [Vibravox dataset](https://huggingface.co/datasets/Cnam-LMSSC/vibravox) is available at [https://huggingface.co/Cnam-LMSSC/vibravox_phonemizers](https://huggingface.co/Cnam-LMSSC/vibravox_phonemizers). ## Training procedure The model has been finetuned for 10 epochs with a constant learning rate of *1e-5*. To reproduce experiment please visit [jhauret/vibravox](https://github.com/jhauret/vibravox). ## Inference script : ```python import torch, torchaudio from transformers import AutoProcessor, AutoModelForCTC from datasets import load_dataset processor = AutoProcessor.from_pretrained("Cnam-LMSSC/phonemizer_airborne.mouth_headworn.reference_microphone") model = AutoModelForCTC.from_pretrained("Cnam-LMSSC/phonemizer_airborne.mouth_headworn.reference_microphone") test_dataset = load_dataset("Cnam-LMSSC/vibravox", "speech_clean", split="test", streaming=True) audio_48kHz = torch.Tensor(next(iter(test_dataset))["audio.airborne.mouth_headworn.reference_microphone"]["array"]) audio_16kHz = torchaudio.functional.resample(audio_48kHz, orig_freq=48_000, new_freq=16_000) inputs = processor(audio_16kHz, sampling_rate=16_000, return_tensors="pt") logits = model(inputs.input_values).logits predicted_ids = torch.argmax(logits,dim = -1) transcription = processor.batch_decode(predicted_ids) print("Phonetic transcription : ", transcription) ``` ## Test Results: In the table below, we report the Phoneme Error Rate (PER) of the model on the several microphones and subset of Vibravox: | Test Set | PER | | ------------- | ------------- | | Vibravox/speech_clean/airborne.mouth_headworn.reference_microphone | **2.874%** | | Vibravox/speech_clean/body_conducted.forehead.miniature_accelerometer | **??%** | | Vibravox/speech_clean/body_conducted.in_ear.comply_foam_microphone | **??%** | | Vibravox/speech_clean/body_conducted.in_ear.rigid_earpiece_microphone | **??%** | | Vibravox/speech_clean/body_conducted.throat.piezoelectric_sensor | **??%** | | Vibravox/speech_clean/body_conducted.temple.contact_microphone | **??%** | | Vibravox/speech_noisy/airborne.mouth_headworn.reference_microphone | **??%** | | Vibravox/speech_noisy/body_conducted.forehead.miniature_accelerometer | **??%** | | Vibravox/speech_noisy/body_conducted.in_ear.comply_foam_microphone | **??%** | | Vibravox/speech_noisy/body_conducted.in_ear.rigid_earpiece_microphone | **??%** | | Vibravox/speech_noisy/body_conducted.throat.piezoelectric_sensor | **??%** | | Vibravox/speech_noisy/body_conducted.temple.contact_microphone | **??%** |