cociweb commited on Jan 9

Commit

44aa66a

•

1 Parent(s): adb2e46

Quantizated models added

Browse files

Files changed (22) hide show

fp16/README.md +53 -0
fp16/hash.json +10 -0
fp16/model.bin +3 -0
fp16/preprocessor_config.json +14 -0
fp16/tokenizer_config.json +0 -0
vocabulary.json → fp16/vocabulary.json +0 -0
vocabulary.txt → fp16/vocabulary.txt +0 -0
fp32/README.md +106 -0
fp32/config.json +154 -0
hash.json → fp32/hash.json +0 -0
model.bin → fp32/model.bin +0 -0
fp32/preprocessor_config.json +14 -0
fp32/tokenizer_config.json +0 -0
fp32/vocabulary.json +0 -0
fp32/vocabulary.txt +0 -0
int8/README.md +53 -0
int8/hash.json +10 -0
int8/model.bin +3 -0
int8/preprocessor_config.json +14 -0
int8/tokenizer_config.json +0 -0
int8/vocabulary.json +0 -0
int8/vocabulary.txt +0 -0

fp16/README.md ADDED Viewed

	@@ -0,0 +1,53 @@

+---
+language:
+  - hu
+tags:
+  - audio
+  - automatic-speech-recognition
+datasets:
+- mozilla-foundation/common_voice_16_0
+base_model: openai/whisper-tiny
+license: mit
+library_name: ctranslate2
+---
+# Whisper tiny model for CTranslate2
+This repository contains the conversion of a fine-tuned version of [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) to the [CTranslate2](https://github.com/OpenNMT/CTranslate2) model format. Fine-tune is made by [@sarpba](https://huggingface.co/sarpba)  on the Common Voice 16 dataset of Mozilla Foundation.
+This model can be used in CTranslate2 or projects based on CTranslate2 such as [faster-whisper](https://github.com/systran/faster-whisper).
+## Example
+```python
+from faster_whisper import WhisperModel
+model = WhisperModel("tiny")
+segments, info = model.transcribe("audio.mp3")
+for segment in segments:
+    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
+```
+## Conversion details
+The original model was converted with the following command:
+```
+ct2-transformers-converter --model Hungarians/whisper-tiny-cv16-hu-v2 --output_dir faster-whisper-tiny-cv16-v2-fp16.hu \
+    --quantization fp16 --low_cpu_mem_usage --copy_files tokenizer_config.json preprocessor_config.json
+```
+Note that the model weights are saved in FP16. This type can be changed when the model is loaded using the [`compute_type` option in CTranslate2](https://opennmt.net/CTranslate2/quantization.html).
+## HASH calculation
+Hash calculation is executed with md5hash in the directory of the model with:
+```
+find ./ -maxdepth 1 -type f -not -path '*/\.*'  -exec md5sum {} \; | tr -d ' '| jq -R 'split("./") | {(.[1]): (.[0])}' | jq -s 'add' > hash.json
+```
+## More information
+**For more information about the original model, see its [model card](https://huggingface.co/Hungarians/whisper-tiny-cv16-hu-v2).**

fp16/hash.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "config.json": "f30b4fcb198aa49c4b8b905f6d6a1ca3",
+  "model.bin": "c4cac28d9225c95dfa0d66ad893feb93",
+  "preprocessor_config.json": "15d1d7ee1cc6801b71f8ab68966aed86",
+  "tokenizer_config.json": "ea5ff3bfa7553fabbfbcb02846302bbc",
+  "vocabulary.json": "aebe7623626c8f3f61cc5208ff29c348",
+  "README.md": "1540e632b3c1a6456cce39ddc2312048",
+  "vocabulary.txt": "980d7011195d0c733bd374e31708717f",
+  "hash.json": "d41d8cd98f00b204e9800998ecf8427e"
+}

fp16/model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:123ba49c0439c2dfcf1919d83a19be05a04d08e98e0c3f705606b1abf649b7f8
+size 75538345

fp16/preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "chunk_length": 30,
+  "feature_extractor_type": "WhisperFeatureExtractor",
+  "feature_size": 80,
+  "hop_length": 160,
+  "n_fft": 400,
+  "n_samples": 480000,
+  "nb_max_frames": 3000,
+  "padding_side": "right",
+  "padding_value": 0.0,
+  "processor_class": "WhisperProcessor",
+  "return_attention_mask": false,
+  "sampling_rate": 16000
+}

fp16/tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff

vocabulary.json → fp16/vocabulary.json RENAMED Viewed

File without changes

vocabulary.txt → fp16/vocabulary.txt RENAMED Viewed

File without changes

fp32/README.md ADDED Viewed

	@@ -0,0 +1,106 @@

+---
+license: apache-2.0
+base_model: openai/whisper-tiny
+tags:
+- hf-asr-leaderboard
+- generated_from_trainer
+datasets:
+- mozilla-foundation/common_voice_16_0
+language:
+- hu
+widget:
+- example_title: Sample 1
+  src: https://huggingface.co/datasets/Hungarians/samples/resolve/main/Sample1.flac
+- example_title: Sample 2
+  src: https://huggingface.co/datasets/Hungarians/samples/resolve/main/Sample2.flac
+metrics:
+- wer
+pipeline_tag: automatic-speech-recognition
+model-index:
+- name: Whisper Tiny Hu v2
+  results:
+  - task:
+      name: Automatic Speech Recognition
+      type: automatic-speech-recognition
+    dataset:
+      name: Common Voice 16.0 - Hungarian
+      type: mozilla-foundation/common_voice_16_0
+      config: hu
+      split: test
+      args: hu
+    metrics:
+    - name: Wer
+      type: wer
+      value: 15.7367
+      verified: true
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# Whisper Tiny Hu v2
+This model is a fine-tuned version of [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) on the Common Voice 16.0 dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.1930
+- Wer Ortho: 17.3040
+- Wer: 15.7367
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 4e-05
+- train_batch_size: 8
+- eval_batch_size: 8
+- seed: 42
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 16
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: constant_with_warmup
+- lr_scheduler_warmup_steps: 500
+- training_steps: 15000
+- mixed_precision_training: Native AMP
+### Training results
+| Training Loss | Epoch | Step  | Validation Loss | Wer Ortho | Wer     |
+|:-------------:|:-----:|:-----:|:---------------:|:---------:|:-------:|
+| 0.5487        | 0.33  | 1000  | 0.5970          | 55.5492   | 52.2206 |
+| 0.3922        | 0.67  | 2000  | 0.4419          | 43.1109   | 39.9911 |
+| 0.3242        | 1.0   | 3000  | 0.3662          | 37.2727   | 34.2040 |
+| 0.2517        | 1.34  | 4000  | 0.3329          | 33.7890   | 30.8746 |
+| 0.2455        | 1.67  | 5000  | 0.2925          | 30.6185   | 28.0196 |
+| 0.1398        | 2.01  | 6000  | 0.2600          | 27.1709   | 24.5983 |
+| 0.1421        | 2.34  | 7000  | 0.2491          | 26.1291   | 23.6347 |
+| 0.1578        | 2.68  | 8000  | 0.2342          | 24.4761   | 22.0783 |
+| 0.0732        | 3.01  | 9000  | 0.2163          | 22.1245   | 19.8547 |
+| 0.0941        | 3.35  | 10000 | 0.2143          | 22.2058   | 19.8399 |
+| 0.0936        | 3.68  | 11000 | 0.2094          | 20.5980   | 18.7756 |
+| 0.0489        | 4.02  | 12000 | 0.2027          | 18.9630   | 17.2665 |
+| 0.0548        | 4.35  | 13000 | 0.1981          | 18.4933   | 16.5491 |
+| 0.0585        | 4.69  | 14000 | 0.1953          | 17.7195   | 15.7693 |
+| 0.0356        | 5.02  | 15000 | 0.1930          | 17.3040   | 15.7367 |
+### Framework versions
+- Transformers 4.36.2
+- Pytorch 2.1.0+cu121
+- Datasets 2.16.1
+- Tokenizers 0.15.0

fp32/config.json ADDED Viewed

	@@ -0,0 +1,154 @@

+{
+  "_name_or_path": "openai/whisper-tiny",
+  "activation_dropout": 0.0,
+  "activation_function": "gelu",
+  "apply_spec_augment": false,
+  "architectures": [
+    "WhisperForConditionalGeneration"
+  ],
+  "attention_dropout": 0.0,
+  "begin_suppress_tokens": [
+    220,
+    50257
+  ],
+  "bos_token_id": 50257,
+  "classifier_proj_size": 256,
+  "d_model": 384,
+  "decoder_attention_heads": 6,
+  "decoder_ffn_dim": 1536,
+  "decoder_layerdrop": 0.0,
+  "decoder_layers": 4,
+  "decoder_start_token_id": 50258,
+  "dropout": 0.0,
+  "encoder_attention_heads": 6,
+  "encoder_ffn_dim": 1536,
+  "encoder_layerdrop": 0.0,
+  "encoder_layers": 4,
+  "eos_token_id": 50257,
+  "forced_decoder_ids": [
+    [
+      1,
+      50259
+    ],
+    [
+      2,
+      50359
+    ],
+    [
+      3,
+      50363
+    ]
+  ],
+  "init_std": 0.02,
+  "is_encoder_decoder": true,
+  "mask_feature_length": 10,
+  "mask_feature_min_masks": 0,
+  "mask_feature_prob": 0.0,
+  "mask_time_length": 10,
+  "mask_time_min_masks": 2,
+  "mask_time_prob": 0.05,
+  "max_length": 448,
+  "max_source_positions": 1500,
+  "max_target_positions": 448,
+  "median_filter_width": 7,
+  "model_type": "whisper",
+  "num_hidden_layers": 4,
+  "num_mel_bins": 80,
+  "pad_token_id": 50257,
+  "scale_embedding": false,
+  "suppress_tokens": [
+    1,
+    2,
+    7,
+    8,
+    9,
+    10,
+    14,
+    25,
+    26,
+    27,
+    28,
+    29,
+    31,
+    58,
+    59,
+    60,
+    61,
+    62,
+    63,
+    90,
+    91,
+    92,
+    93,
+    359,
+    503,
+    522,
+    542,
+    873,
+    893,
+    902,
+    918,
+    922,
+    931,
+    1350,
+    1853,
+    1982,
+    2460,
+    2627,
+    3246,
+    3253,
+    3268,
+    3536,
+    3846,
+    3961,
+    4183,
+    4667,
+    6585,
+    6647,
+    7273,
+    9061,
+    9383,
+    10428,
+    10929,
+    11938,
+    12033,
+    12331,
+    12562,
+    13793,
+    14157,
+    14635,
+    15265,
+    15618,
+    16553,
+    16604,
+    18362,
+    18956,
+    20075,
+    21675,
+    22520,
+    26130,
+    26161,
+    26435,
+    28279,
+    29464,
+    31650,
+    32302,
+    32470,
+    36865,
+    42863,
+    47425,
+    49870,
+    50254,
+    50258,
+    50358,
+    50359,
+    50360,
+    50361,
+    50362
+  ],
+  "torch_dtype": "float32",
+  "transformers_version": "4.36.2",
+  "use_cache": false,
+  "use_weighted_layer_sum": false,
+  "vocab_size": 51865
+}

hash.json → fp32/hash.json RENAMED Viewed

File without changes

model.bin → fp32/model.bin RENAMED Viewed

File without changes

fp32/preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "chunk_length": 30,
+  "feature_extractor_type": "WhisperFeatureExtractor",
+  "feature_size": 80,
+  "hop_length": 160,
+  "n_fft": 400,
+  "n_samples": 480000,
+  "nb_max_frames": 3000,
+  "padding_side": "right",
+  "padding_value": 0.0,
+  "processor_class": "WhisperProcessor",
+  "return_attention_mask": false,
+  "sampling_rate": 16000
+}

fp32/tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff

fp32/vocabulary.json ADDED Viewed

The diff for this file is too large to render. See raw diff

fp32/vocabulary.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

int8/README.md ADDED Viewed

	@@ -0,0 +1,53 @@

+---
+language:
+  - hu
+tags:
+  - audio
+  - automatic-speech-recognition
+datasets:
+- mozilla-foundation/common_voice_16_0
+base_model: openai/whisper-tiny
+license: mit
+library_name: ctranslate2
+---
+# Whisper tiny model for CTranslate2
+This repository contains the conversion of a fine-tuned version of [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) to the [CTranslate2](https://github.com/OpenNMT/CTranslate2) model format. Fine-tune is made by [@sarpba](https://huggingface.co/sarpba) on the Common Voice 16 dataset of Mozilla Foundation.
+This model can be used in CTranslate2 or projects based on CTranslate2 such as [faster-whisper](https://github.com/systran/faster-whisper).
+## Example
+```python
+from faster_whisper import WhisperModel
+model = WhisperModel("tiny")
+segments, info = model.transcribe("audio.mp3")
+for segment in segments:
+    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
+```
+## Conversion details
+The original model was converted with the following command:
+```
+ct2-transformers-converter --model Hungarians/whisper-tiny-cv16-hu-v2 --output_dir faster-whisper-tiny-cv16-v2-int8.hu \
+    --quantization int8 --low_cpu_mem_usage --copy_files tokenizer_config.json preprocessor_config.json
+```
+Note that the model weights are saved in INT8. This type can be changed when the model is loaded using the [`compute_type` option in CTranslate2](https://opennmt.net/CTranslate2/quantization.html).
+## HASH calculation
+Hash calculation is executed with md5hash in the directory of the model with:
+```
+find ./ -maxdepth 1 -type f -not -path '*/\.*'  -exec md5sum {} \; | tr -d ' ' | jq -R 'split("./") | {(.[1]): (.[0])}' | jq -s 'add' > hash.json
+```
+## More information
+**For more information about the original model, see its [model card](https://huggingface.co/Hungarians/whisper-tiny-cv16-hu-v2).**

int8/hash.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "README.md": "ddaeacaba1425164423be2c16bcee6e8",
+  "vocabulary.txt": "980d7011195d0c733bd374e31708717f",
+  "config.json": "f30b4fcb198aa49c4b8b905f6d6a1ca3",
+  "model.bin": "ad6f669131ccc36ef33d8d1adf1610de",
+  "preprocessor_config.json": "15d1d7ee1cc6801b71f8ab68966aed86",
+  "tokenizer_config.json": "ea5ff3bfa7553fabbfbcb02846302bbc",
+  "vocabulary.json": "aebe7623626c8f3f61cc5208ff29c348",
+  "hash.json": "d41d8cd98f00b204e9800998ecf8427e"
+}

int8/model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0ace8dacb69d489544478bf09b835f9702b649da12dbdf1577a48b79f492a6af
+size 42120441

int8/preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "chunk_length": 30,
+  "feature_extractor_type": "WhisperFeatureExtractor",
+  "feature_size": 80,
+  "hop_length": 160,
+  "n_fft": 400,
+  "n_samples": 480000,
+  "nb_max_frames": 3000,
+  "padding_side": "right",
+  "padding_value": 0.0,
+  "processor_class": "WhisperProcessor",
+  "return_attention_mask": false,
+  "sampling_rate": 16000
+}

int8/tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff

int8/vocabulary.json ADDED Viewed

The diff for this file is too large to render. See raw diff

int8/vocabulary.txt ADDED Viewed

The diff for this file is too large to render. See raw diff