Segmenting output by phone

by kalbin - opened Jun 27, 2022

Discussion

kalbin

Jun 27, 2022

Is it possible to segment the output by phones in order to calculate PER instead of CER?

I tried applying Wav2Vec2PhonemeCTCTokenizer (with the current vocab.json) but the results were horrible.

vitouphy

Owner Jun 28, 2022

Assuming the label data is already tokenized, we can map one or two characters phoneme tʃ into a number, and convert that number into a character using chr(x).
Then, we can use the same CER. This time it would be on phone-level.

kalbin

Jun 28, 2022

Ohh good idea! Would this mean that the model would need to be retrained?

Currently, I'm just using the pipeline approach and taking the text output

vitouphy

Owner Jul 1, 2022

@kalbin It's not needed to be re-trained. The model is trained to output bunch of numbers, like 5, 12, 6, 18. We just need to map that number to a different string.
Using pipeline approach does the whole conversion into string as well. It's a slightly difficult to manipulate at a granular level.

kalbin

Jul 5, 2022

I was able to get the segmented phones by following Approach 2 and passing output_char_offsets=True to batch_decode

vitouphy changed discussion status to closed Nov 20, 2022

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment