Spaces:

Xenova
/

whisper-web

Running

App Files Files Community

Taking too much time on 4 GB RAM

by rajeev148711 - opened Sep 19, 2023

Discussion

rajeev148711

Sep 19, 2023

•

edited Sep 19, 2023

Hi, using this, it's taking too much time for converting audio to text in 4 GB RAM system, it's near around 4 to 5 minutes time to take convert. provide system configuration for this

rajeev148711 changed discussion status to closed Sep 19, 2023

rajeev148711 changed discussion status to open Sep 19, 2023

Xenova

Owner Sep 19, 2023

Hi there. Could you perhaps give more information about your system? Which OS, browser, CPU, etc. are you using? Also, how long is the audio file you are trying to transcribe?

rajeev148711

Sep 20, 2023

•

edited Sep 20, 2023

Which OS :- window 7 home basic
browser :- Google chrome 109
CPU :- Intel Core i3-3110M cpu @ 2.40GHz
System type:- 64 bit
RAM:- 4GB
Audio Length:- 60 seconds

Xenova

Owner Sep 20, 2023

Considering that you’re using a very old computer, I don’t think it’s a problem with the app/library.

You could also try updating the settings to use the quantized version of the model. By default, the tiny unquantized model is used, which is ~160MB.

rajeev148711

Sep 20, 2023

But, the accuracy of Conversion is not good as compare to small model

Xenova

Owner Sep 20, 2023

The small model is ~250 million parameters, which would explain the slow execution time on your system.

Unfortunately I’d say you need to make some compromises with speed and accuracy on your hardware.

You could try the “base” model, which is ~75 million parameters.

Also, in some cases, the unquantized versions of models are faster, so you can play around with the model settings.

rajeev148711

Sep 21, 2023

Kindly provide accuracy of unquantized base model as compare to small quantized

rajeev148711

Sep 25, 2023

Hi, as you suggest base model unquantized, I am using this base model with unquantized. One issue face in this model, the word or number is repeated many times, Kindly check given text with given audio file.

I can see a table in front of me which shows the statistics of victims in Africa before the introduction of their respective anti-roads. So in years I can see 19,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 arthritis, syphilis and the maximum numbers is for diaphragm in 1960 which goes up to 500 and the minimum values are of AIDS in 1900. It is showing the victims in Africa.

Xenova

Owner Sep 25, 2023

One issue face in this model, the word or number is repeated many times

If using the unquantized model, you should get the exact same output as if you ran the python version (HF transformers). So, you will also see these artifacts there.

Unfortunately, this is a limitation of the model itself. There are some ways to try fix this (with logits processors), but those are not available in this user interface.

deleted

Sep 28, 2023

•

edited Sep 28, 2023

no upper pattern kind of case is not in python whisper please improve it or include some logic of text manupulation in base model please. i t will add more quality in your hard working this marathon project. and one word or one sentance audio is not doing accurate. like short command audio

Xenova

Owner Sep 28, 2023

•

edited Sep 28, 2023

You can also use some of the generation parameters (like no_repeat_ngram_size or repetition_penalty), but these are not available in the user interface at the moment. See docs for the full list of parameters.

If you can provide an audio clip where this problem happens in our unquantized model but not in the python version, please file an issue to https://github.com/xenova/transformers.js.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment