Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -35,9 +35,10 @@ For a quick demo, simply run `make base.en`:
|
|
| 35 |
|
| 36 |
```java
|
| 37 |
$ make base.en
|
| 38 |
-
|
|
|
|
| 39 |
c++ -O3 -std=c++11 -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -pthread -c whisper.cpp
|
| 40 |
-
c++ -O3 -std=c++11 -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -pthread main.cpp whisper.o ggml.o -o main
|
| 41 |
./main -h
|
| 42 |
|
| 43 |
usage: ./main [options] file0.wav file1.wav ...
|
|
@@ -60,7 +61,7 @@ options:
|
|
| 60 |
|
| 61 |
bash ./download-ggml-model.sh base.en
|
| 62 |
Downloading ggml model base.en ...
|
| 63 |
-
models/ggml-base.en.bin 100%[===================================>] 141.11M
|
| 64 |
Done! Model 'base.en' saved in 'models/ggml-base.en.bin'
|
| 65 |
You can now use it like this:
|
| 66 |
|
|
@@ -88,7 +89,7 @@ whisper_model_load: n_text_layer = 6
|
|
| 88 |
whisper_model_load: n_mels = 80
|
| 89 |
whisper_model_load: f16 = 1
|
| 90 |
whisper_model_load: type = 2
|
| 91 |
-
whisper_model_load: mem_required =
|
| 92 |
whisper_model_load: adding 1607 extra tokens
|
| 93 |
whisper_model_load: ggml ctx size = 163.43 MB
|
| 94 |
whisper_model_load: memory size = 22.83 MB
|
|
@@ -99,12 +100,12 @@ main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, lang =
|
|
| 99 |
[00:00.000 --> 00:11.000] And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.
|
| 100 |
|
| 101 |
|
| 102 |
-
whisper_print_timings: load time =
|
| 103 |
-
whisper_print_timings: mel time =
|
| 104 |
-
whisper_print_timings: sample time =
|
| 105 |
-
whisper_print_timings: encode time =
|
| 106 |
-
whisper_print_timings: decode time =
|
| 107 |
-
whisper_print_timings: total time =
|
| 108 |
```
|
| 109 |
|
| 110 |
The command downloads the `base.en` model converted to custom `ggml` format and runs the inference on all `.wav` samples in the folder `samples`.
|
|
@@ -145,7 +146,7 @@ make large
|
|
| 145 |
## Another example
|
| 146 |
|
| 147 |
Here is another example of transcribing a [3:24 min speech](https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg)
|
| 148 |
-
in
|
| 149 |
|
| 150 |
```java
|
| 151 |
$ ./main -m models/ggml-medium.en.bin -f samples/gb1.wav -t 8
|
|
@@ -163,51 +164,55 @@ whisper_model_load: n_text_layer = 24
|
|
| 163 |
whisper_model_load: n_mels = 80
|
| 164 |
whisper_model_load: f16 = 1
|
| 165 |
whisper_model_load: type = 4
|
| 166 |
-
whisper_model_load: mem_required =
|
| 167 |
whisper_model_load: adding 1607 extra tokens
|
| 168 |
whisper_model_load: ggml ctx size = 1644.97 MB
|
| 169 |
whisper_model_load: memory size = 182.62 MB
|
| 170 |
whisper_model_load: model size = 1462.12 MB
|
| 171 |
-
log_mel_spectrogram: n_sample = 3179750, n_len = 19873
|
| 172 |
-
log_mel_spectrogram: recording length: 198.734375 s
|
| 173 |
|
| 174 |
-
main: processing 3179750 samples
|
| 175 |
|
| 176 |
[00:00.000 --> 00:08.000] My fellow Americans, this day has brought terrible news and great sadness to our country.
|
| 177 |
-
[00:08.000 --> 00:17.000] At
|
| 178 |
-
[00:17.000 --> 00:
|
| 179 |
-
[00:
|
| 180 |
[00:29.000 --> 00:32.000] On board was a crew of seven.
|
| 181 |
-
[00:32.000 --> 00:
|
| 182 |
-
[00:
|
|
|
|
| 183 |
[00:52.000 --> 00:58.000] These men and women assumed great risk in the service to all humanity.
|
| 184 |
-
[00:58.000 --> 01:
|
| 185 |
-
[01:
|
| 186 |
-
[01:
|
| 187 |
-
[01:
|
| 188 |
-
[01:
|
| 189 |
-
[01:
|
| 190 |
-
[01:
|
|
|
|
|
|
|
|
|
|
| 191 |
[01:52.000 --> 01:56.000] The cause in which they died will continue.
|
| 192 |
-
[01:56.000 --> 02:
|
| 193 |
-
[02:
|
| 194 |
[02:11.000 --> 02:16.000] In the skies today, we saw destruction and tragedy.
|
| 195 |
[02:16.000 --> 02:22.000] Yet farther than we can see, there is comfort and hope.
|
| 196 |
-
[02:22.000 --> 02:
|
| 197 |
-
[02:
|
| 198 |
-
[02:
|
| 199 |
-
[02:
|
| 200 |
-
[02:
|
| 201 |
-
[
|
| 202 |
-
[03:
|
| 203 |
-
|
| 204 |
-
|
| 205 |
-
|
| 206 |
-
|
| 207 |
-
|
| 208 |
-
|
| 209 |
-
|
| 210 |
-
|
|
|
|
|
|
|
| 211 |
```
|
| 212 |
|
| 213 |
## Real-time audio input example
|
|
|
|
| 35 |
|
| 36 |
```java
|
| 37 |
$ make base.en
|
| 38 |
+
|
| 39 |
+
cc -O3 -std=c11 -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -pthread -DGGML_USE_ACCELERATE -c ggml.c
|
| 40 |
c++ -O3 -std=c++11 -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -pthread -c whisper.cpp
|
| 41 |
+
c++ -O3 -std=c++11 -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -pthread main.cpp whisper.o ggml.o -o main -framework Accelerate
|
| 42 |
./main -h
|
| 43 |
|
| 44 |
usage: ./main [options] file0.wav file1.wav ...
|
|
|
|
| 61 |
|
| 62 |
bash ./download-ggml-model.sh base.en
|
| 63 |
Downloading ggml model base.en ...
|
| 64 |
+
models/ggml-base.en.bin 100%[=============================================>] 141.11M 3.13MB/s in 79s
|
| 65 |
Done! Model 'base.en' saved in 'models/ggml-base.en.bin'
|
| 66 |
You can now use it like this:
|
| 67 |
|
|
|
|
| 89 |
whisper_model_load: n_mels = 80
|
| 90 |
whisper_model_load: f16 = 1
|
| 91 |
whisper_model_load: type = 2
|
| 92 |
+
whisper_model_load: mem_required = 505.00 MB
|
| 93 |
whisper_model_load: adding 1607 extra tokens
|
| 94 |
whisper_model_load: ggml ctx size = 163.43 MB
|
| 95 |
whisper_model_load: memory size = 22.83 MB
|
|
|
|
| 100 |
[00:00.000 --> 00:11.000] And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.
|
| 101 |
|
| 102 |
|
| 103 |
+
whisper_print_timings: load time = 87.21 ms
|
| 104 |
+
whisper_print_timings: mel time = 24.26 ms
|
| 105 |
+
whisper_print_timings: sample time = 3.87 ms
|
| 106 |
+
whisper_print_timings: encode time = 323.67 ms / 53.94 ms per layer
|
| 107 |
+
whisper_print_timings: decode time = 83.25 ms / 13.87 ms per layer
|
| 108 |
+
whisper_print_timings: total time = 522.66 ms
|
| 109 |
```
|
| 110 |
|
| 111 |
The command downloads the `base.en` model converted to custom `ggml` format and runs the inference on all `.wav` samples in the folder `samples`.
|
|
|
|
| 146 |
## Another example
|
| 147 |
|
| 148 |
Here is another example of transcribing a [3:24 min speech](https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg)
|
| 149 |
+
in about half a minute on a MacBook M1 Pro, using `medium.en` model:
|
| 150 |
|
| 151 |
```java
|
| 152 |
$ ./main -m models/ggml-medium.en.bin -f samples/gb1.wav -t 8
|
|
|
|
| 164 |
whisper_model_load: n_mels = 80
|
| 165 |
whisper_model_load: f16 = 1
|
| 166 |
whisper_model_load: type = 4
|
| 167 |
+
whisper_model_load: mem_required = 2610.00 MB
|
| 168 |
whisper_model_load: adding 1607 extra tokens
|
| 169 |
whisper_model_load: ggml ctx size = 1644.97 MB
|
| 170 |
whisper_model_load: memory size = 182.62 MB
|
| 171 |
whisper_model_load: model size = 1462.12 MB
|
|
|
|
|
|
|
| 172 |
|
| 173 |
+
main: processing 'samples/gb1.wav' (3179750 samples, 198.7 sec), 8 threads, lang = en, task = transcribe, timestamps = 1 ...
|
| 174 |
|
| 175 |
[00:00.000 --> 00:08.000] My fellow Americans, this day has brought terrible news and great sadness to our country.
|
| 176 |
+
[00:08.000 --> 00:17.000] At nine o'clock this morning, Mission Control in Houston lost contact with our Space Shuttle Columbia.
|
| 177 |
+
[00:17.000 --> 00:23.000] A short time later, debris was seen falling from the skies above Texas.
|
| 178 |
+
[00:23.000 --> 00:29.000] The Columbia's lost. There are no survivors.
|
| 179 |
[00:29.000 --> 00:32.000] On board was a crew of seven.
|
| 180 |
+
[00:32.000 --> 00:39.000] Colonel Rick Husband, Lieutenant Colonel Michael Anderson, Commander Laurel Clark,
|
| 181 |
+
[00:39.000 --> 00:48.000] Captain David Brown, Commander William McCool, Dr. Kultna Shavla, and Ilan Ramon,
|
| 182 |
+
[00:48.000 --> 00:52.000] a colonel in the Israeli Air Force.
|
| 183 |
[00:52.000 --> 00:58.000] These men and women assumed great risk in the service to all humanity.
|
| 184 |
+
[00:58.000 --> 01:03.000] In an age when space flight has come to seem almost routine,
|
| 185 |
+
[01:03.000 --> 01:07.000] it is easy to overlook the dangers of travel by rocket
|
| 186 |
+
[01:07.000 --> 01:12.000] and the difficulties of navigating the fierce outer atmosphere of the Earth.
|
| 187 |
+
[01:12.000 --> 01:18.000] These astronauts knew the dangers, and they faced them willingly,
|
| 188 |
+
[01:18.000 --> 01:23.000] knowing they had a high and noble purpose in life.
|
| 189 |
+
[01:23.000 --> 01:31.000] Because of their courage and daring and idealism, we will miss them all the more.
|
| 190 |
+
[01:31.000 --> 01:36.000] All Americans today are thinking as well of the families of these men and women
|
| 191 |
+
[01:36.000 --> 01:40.000] who have been given this sudden shock and grief.
|
| 192 |
+
[01:40.000 --> 01:45.000] You're not alone. Our entire nation grieves with you,
|
| 193 |
+
[01:45.000 --> 01:52.000] and those you love will always have the respect and gratitude of this country.
|
| 194 |
[01:52.000 --> 01:56.000] The cause in which they died will continue.
|
| 195 |
+
[01:56.000 --> 02:04.000] Mankind is led into the darkness beyond our world by the inspiration of discovery
|
| 196 |
+
[02:04.000 --> 02:11.000] and the longing to understand. Our journey into space will go on.
|
| 197 |
[02:11.000 --> 02:16.000] In the skies today, we saw destruction and tragedy.
|
| 198 |
[02:16.000 --> 02:22.000] Yet farther than we can see, there is comfort and hope.
|
| 199 |
+
[02:22.000 --> 02:29.000] In the words of the prophet Isaiah, "Lift your eyes and look to the heavens
|
| 200 |
+
[02:29.000 --> 02:35.000] who created all these. He who brings out the starry hosts one by one
|
| 201 |
+
[02:35.000 --> 02:39.000] and calls them each by name."
|
| 202 |
+
[02:39.000 --> 02:46.000] Because of His great power and mighty strength, not one of them is missing.
|
| 203 |
+
[02:46.000 --> 02:55.000] The same Creator who names the stars also knows the names of the seven souls we mourn today.
|
| 204 |
+
[02:55.000 --> 03:01.000] The crew of the shuttle Columbia did not return safely to earth,
|
| 205 |
+
[03:01.000 --> 03:05.000] yet we can pray that all are safely home.
|
| 206 |
+
[03:05.000 --> 03:13.000] May God bless the grieving families, and may God continue to bless America.
|
| 207 |
+
[03:13.000 --> 03:41.000] Audio
|
| 208 |
+
|
| 209 |
+
|
| 210 |
+
whisper_print_timings: load time = 575.92 ms
|
| 211 |
+
whisper_print_timings: mel time = 230.60 ms
|
| 212 |
+
whisper_print_timings: sample time = 73.19 ms
|
| 213 |
+
whisper_print_timings: encode time = 19552.61 ms / 814.69 ms per layer
|
| 214 |
+
whisper_print_timings: decode time = 13249.96 ms / 552.08 ms per layer
|
| 215 |
+
whisper_print_timings: total time = 33686.27 ms
|
| 216 |
```
|
| 217 |
|
| 218 |
## Real-time audio input example
|