Any plans of a quantized or distilled version ?

#4
by flutter-painter - opened
cawoylel org

This model is great but is 3 Go,
Do you intend to make a lighter version ?

cawoylel org

I tried to quantized it to ggml format using whisper.cpp (https://github.com/ggerganov/whisper.cpp). Indeed, the model was lighter and faster. But we plan to retrain Whisper with more data and longer, by reducing the vocabulary matrix. This model will be smaller, and combined with the quantization we hope it will fit on small devices.

cawoylel org

Ok, good news. Can't wait to see this new model
Still I am not sure it will fit on small device. I was thinking about serving it at a lower cost or running it locally on desktop.
To fit on mobile devices it would need to be at least below 500Mo, actually below 200Mo should be the target considering the kind of android phones used by most fula speaker and also leaving room for offline translation model.
For this use case, and since you aggregated a vast dataset, icefall and sherpa seems to me a safer choice : https://k2-fsa.github.io/sherpa/onnx/index.html

Sign up or log in to comment