---
library_name: transformers
license: mit
language: fr
datasets:
- Cnam-LMSSC/vibravox
metrics:
- per
tags:
- audio
- automatic-speech-recognition
- speech
- phonemize
- phoneme
model-index:
- name: Wav2Vec2-base French finetuned for Speech-to-Phoneme by LMSSC
  results:
  - task:
      name: Speech-to-Phoneme
      type: automatic-speech-recognition
    dataset:
      name: Vibravox["airborne.mouth_headworn.reference_microphone"]
      type: Cnam-LMSSC/vibravox
      args: fr
    metrics:
    - name: Test PER on Vibravox["airborne.mouth_headworn.reference_microphone"] | Trained
      type: per
      value: 2.874
---


# Model Card 

- **Developed by:** [Cnam-LMSSC](https://huggingface.co/Cnam-LMSSC)
- **Model type:** [Wav2Vec2ForCTC](https://huggingface.co/transformers/v4.9.2/model_doc/wav2vec2.html#transformers.Wav2Vec2ForCTC)
- **Language:** French
- **License:** MIT
- **Finetuned from model:** [facebook/wav2vec2-base-fr-voxpopuli-v2](https://huggingface.co/facebook/wav2vec2-base-fr-voxpopuli-v2)
- **Finetuned dataset:** `airborne.mouth_headworn.reference_microphone` audio of the `speech_clean` subset of [Cnam-LMSSC/vibravox](https://huggingface.co/datasets/Cnam-LMSSC/vibravox)
- **Samplerate for usage:** 16kHz

<p align="center">
  <img src="https://cdn-uploads.huggingface.co/production/uploads/65302a613ecbe51d6a6ddcec/zhB1fh-c0pjlj-Tr4Vpmr.png" />
</p>


## Output

As this model is specifically trained for a speech-to-phoneme task, the output is sequence of [IPA-encoded](https://en.wikipedia.org/wiki/International_Phonetic_Alphabet) words, without punctuation.
If you don't read the phonetic alphabet fluently, you can use this excellent [IPA reader website](http://ipa-reader.xyz) to convert the transcript back to audio synthetic speech in order to check the quality of the phonetic transcription.

## Link to other phonemizer models trained on other body conducted sensors : 

An entry point to all **phonemizers** (speech-to-phoneme ASR) models trained on different sensor data from the trained on different sensor data from the [Vibravox dataset](https://huggingface.co/datasets/Cnam-LMSSC/vibravox) is available at [https://huggingface.co/Cnam-LMSSC/vibravox_phonemizers](https://huggingface.co/Cnam-LMSSC/vibravox_phonemizers).  

## Training procedure

The model has been finetuned for 10 epochs with a constant learning rate of *1e-5*. To reproduce experiment please visit [jhauret/vibravox](https://github.com/jhauret/vibravox).

## Inference script : 

```python
import torch, torchaudio
from transformers import AutoProcessor, AutoModelForCTC
from datasets import load_dataset

processor = AutoProcessor.from_pretrained("Cnam-LMSSC/phonemizer_airborne.mouth_headworn.reference_microphone")
model = AutoModelForCTC.from_pretrained("Cnam-LMSSC/phonemizer_airborne.mouth_headworn.reference_microphone")
test_dataset = load_dataset("Cnam-LMSSC/vibravox", "speech_clean", split="test", streaming=True)

audio_48kHz = torch.Tensor(next(iter(test_dataset))["audio.airborne.mouth_headworn.reference_microphone"]["array"])
audio_16kHz = torchaudio.functional.resample(audio_48kHz, orig_freq=48_000, new_freq=16_000)

inputs = processor(audio_16kHz, sampling_rate=16_000, return_tensors="pt")
logits = model(inputs.input_values).logits
predicted_ids = torch.argmax(logits,dim = -1)
transcription = processor.batch_decode(predicted_ids)

print("Phonetic transcription : ", transcription)
```

## Test Results:

In the table below, we report the Phoneme Error Rate (PER) of the model on the several microphones and subset of Vibravox:

| Test Set  | PER |
| ------------- | ------------- |
| Vibravox/speech_clean/airborne.mouth_headworn.reference_microphone | **2.874%** |
| Vibravox/speech_clean/body_conducted.forehead.miniature_accelerometer | **??%** |
| Vibravox/speech_clean/body_conducted.in_ear.comply_foam_microphone | **??%** |
| Vibravox/speech_clean/body_conducted.in_ear.rigid_earpiece_microphone | **??%** |
| Vibravox/speech_clean/body_conducted.throat.piezoelectric_sensor | **??%** |
| Vibravox/speech_clean/body_conducted.temple.contact_microphone | **??%** |
| Vibravox/speech_noisy/airborne.mouth_headworn.reference_microphone | **??%** |
| Vibravox/speech_noisy/body_conducted.forehead.miniature_accelerometer | **??%** |
| Vibravox/speech_noisy/body_conducted.in_ear.comply_foam_microphone | **??%** |
| Vibravox/speech_noisy/body_conducted.in_ear.rigid_earpiece_microphone | **??%** |
| Vibravox/speech_noisy/body_conducted.throat.piezoelectric_sensor | **??%** |
| Vibravox/speech_noisy/body_conducted.temple.contact_microphone | **??%** |