Numerical tokens

#1
by lerela - opened

Thanks @bofenghuang for this great model!

This post is just to inform HF users that "French alphabet characters" also means that numbers are not included. Therefore dates are outputted as strings (deux mille vingt et un). It might make parsing harder depending on your applications.

Are there any plans to release a version including numbers token?

Hi @lerela ,

Thanks for pointing it out ! This model predicts the numbers in the format of words. And all the numbers in evaluation datasets were converted to words when computing the WER.

I agree with you that a model also decoding numbers will be more practical. But currently I don't enough gpu resources. I will let you know if one day it's done. In the meantime, it might be useful to add a word-to-number as postprocessing

That makes sense, thanks for your reply and for your work!

Sign up or log in to comment