Spaces:
Running
title: README
emoji: 📚
colorFrom: yellow
colorTo: yellow
sdk: static
pinned: true
The Hungarians Organization
We decided to create an organization to collect the latest (and useable) models for the Hungarian specific finetuned LLMs (Whisper, Bart, LLama, etc). Feel free to join our organization and push your models.
About the models
Hungarian language specific compare test results (on Google/flerus):
Original models | WER | CER | Normalized_WER | Normalized_CER | Database | Split | Runtime |
---|---|---|---|---|---|---|---|
openai/whisper-tiny | 102.46 | 50.31 | 103.37 | 50.19 | google/fleurs | test | 60.44 |
openai/whisper-base | 89.08 | 41.3 | 93.13 | 41.56 | google/fleurs | test | 89.66 |
openai/whisper-small | 48.67 | 15.1 | 45.55 | 15.39 | google/fleurs | test | 175.03 |
openai/whisper-medium | 32.49 | 9.58 | 29.04 | 10.05 | google/fleurs | test | 393.56 |
openai/whisper-large | 28.2 | 7.77 | 24.76 | 8.31 | google/fleurs | test | 675.77 |
openai/whisper-large-v2 | 23.14 | 5.94 | 19.83 | 6.48 | google/fleurs | test | 772.64 |
openai/whisper-large-v3 | 18.88 | 4.56 | 15.48 | 5.2 | google/fleurs | test | 667.66 |
Finetuned models | |||||||
Hungarians/whisper-small-cv17-hu | 188.94 | 75.87 | 188.21 | 77.32 | google/fleurs | test | 472.43 |
Hungarians/whisper-tiny-cv16-hu-v3 | 75.9 | 50.61 | 85.55 | 50.91 | google/fleurs | test | 65.17 |
Hungarians/whisper-tiny-cv16-hu-v2 | 72.13 | 41.71 | 71.13 | 41.45 | google/fleurs | test | 50.48 |
Hungarians/whisper-tiny-cv16-hu-final | 68.43 | 38.24 | 64.07 | 38.14 | google/fleurs | test | 41.48 |
Hungarians/whisper-tiny-cv16-hu | 64.7 | 28.02 | 60.9 | 27.7 | google/fleurs | test | 42.35 |
Hungarians/whisper-tiny-hu-cleaned | 59.67 | 26.01 | 54.72 | 25.73 | google/fleurs | test | 33.72 |
Hungarians/whisper-tiny-cv17-hu | 58.76 | 24.86 | 56.1 | 24.72 | google/fleurs | test | 39.57 |
sarpba/whisper-tiny-cv18-hu-cleaned | 52.74 | 24.02 | 50.09 | 23.91 | google/fleurs | test | 40.16 |
Hungarians/whisper-base-cv16-hu-v2 | 51.41 | 20.97 | 46.79 | 20.93 | google/fleurs | test | 70.57 |
Hungarians/whisper-base-hu-cleaned | 51.38 | 20.05 | 46.54 | 20.14 | google/fleurs | test | 70.84 |
Hungarians/whisper-base-cv16-hu | 50.06 | 17.71 | 44.83 | 17.44 | google/fleurs | test | 65.49 |
Hungarians/whisper-medium-cv16-hu | 49.77 | 24.98 | 47.79 | 25.4 | google/fleurs | test | 498.53 |
Hungarians/whisper-base-cv16-hu-final | 48.37 | 16.28 | 43.84 | 16.31 | google/fleurs | test | 67.07 |
Hungarians/whisper-base-cv17-hu | 45.61 | 14.95 | 40.79 | 14.94 | google/fleurs | test | 64.15 |
sarpba/whisper-base-cv18-hu-cleaned | 42.09 | 13.67 | 36.66 | 13.53 | google/fleurs | test | 54.7 |
Hungarians/whisper-small-cv16-hu-v2 | 41.07 | 13.16 | 36.59 | 13.21 | google/fleurs | test | 201.28 |
Hungarians/Whisper-small-hu-cleaned | 39.12 | 13.91 | 41.15 | 14.11 | google/fleurs | test | 274.09 |
Hungarians/whisper-small-cv16-hu | 37.5 | 11.31 | 32.54 | 11.35 | google/fleurs | test | 608.28 |
Hungarians/whisper-small-cv16-hu-v1.5 | 35.61 | 10.99 | 30.33 | 11.04 | google/fleurs | test | 605.69 |
Hungarians/whisper-medium-hu-cleaned | 26.26 | 6.8 | 21.97 | 7.31 | google/fleurs | test | 442.53 |
Our best models | |||||||
sarpba/whisper-tiny-cv18-hu-cleaned | 52.74 | 24.02 | 50.09 | 23.91 | google/fleurs | test | 40.16 |
sarpba/whisper-base-cv18-hu-cleaned | 42.09 | 13.67 | 36.66 | 13.53 | google/fleurs | test | 54.7 |
sarpba/whisper-small-cv18-hu-cleaned | 29.75 | 9.23 | 25.19 | 9.38 | google/fleurs | test | 281.95 |
sarpba/whisper-medium-cv18-hu-cleaned | 23.89 | 6.79 | 19.81 | 7.3 | google/fleurs | test | 541.17 |
Hungarians/whisper-large-v2-hu-cleaned | 21.82 | 5.51 | 18.39 | 6.15 | google/fleurs | test | 725.31 |
AZ UTOLSÓ HÁROM SOR INT8 KVANTÁLT MODELL EREDMÉNYE. |
Quant loss examle
Model | WER | CER | Normalized_WER | Normalized_CER | Database | Split | Runtime |
---|---|---|---|---|---|---|---|
Hungarians/whisper-base-cv17-hu | 45.61 | 14.95 | 40.79 | 14.94 | google/fleurs | test | 243.97 |
float16 | 50.55 | 21.01 | 46.81 | 20.99 | google/fleurs | test | 301.41 |
float32 | 49.69 | 20.77 | 47.38 | 20.74 | google/fleurs | test | 339.15 |
int8_float32 | 46.71 | 16.67 | 42.51 | 16.51 | google/fleurs | test | 246.06 |
int8_float16 | 46.5 | 17.13 | 42.23 | 16.92 | google/fleurs | test | 242.12 |
int8_bfloat16 | 45.7 | 15.06 | 41.03 | 15.04 | google/fleurs | test | 148.05 |
bfloat16 | 45.6 | 15 | 40.88 | 14.97 | google/fleurs | test | 144.87 |
int8 | 45.54 | 16.55 | 42.4 | 16.44 | google/fleurs | test | 236.97 |
As you can see the INT8 quant have better points form original modell.
Lower value is better!
For Homeassistant faster-whisper need to use, the int8, fp16, fp32 modells, from subfolders.
Some Hungarian info bellow:
A kész nodellek mindíg itt vannak, az én (sarpba) repómban a félkész, vagy kisérleti stádiumu cuccok vannak.
Hosassistant faster-whisperhez az almappákban lévő int8, fp16, fp32 ct2 quantised (ezt nem tom hogy kéne magyarul írni :)) modelleket tudjátok használni a legegyszerűbben cociweb custom_whisper addonjával.
Közösség
Ha szeretnél csatlakozni a magyar nyelvű társalkodó csoportunkhoz ahol kérdezhetsz, megoszthatod a tapasztalataidat, vagy egy, a magyar LLM szakértőiből álló csoport tagja szeretnél lenni, csatlakozz FB csoportunkhoz: Hungarian-LLM.