n3wtou
/

mt5-small-finedtuned-4-swahili

@@ -14,9 +14,9 @@ probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [google/mt5-small](https://huggingface.co/google/mt5-small) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Train Loss: 0.1500
-- Validation Loss: 5.6063
-- Epoch: 72
 ## Model description
@@ -35,91 +35,19 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 0.0003, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 0.0003, 'decay_steps': 99900, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, '__passive_serialization__': True}, 'warmup_steps': 100, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.001}
 - training_precision: mixed_float16
 ### Training results
 | Train Loss | Validation Loss | Epoch |
 |:----------:|:---------------:|:-----:|
-| 7.7378     | 4.4308          | 0     |
-| 4.9883     | 3.7424          | 1     |
-| 4.2872     | 3.4004          | 2     |
-| 3.8475     | 3.1811          | 3     |
-| 3.5692     | 3.0209          | 4     |
-| 3.3360     | 2.9025          | 5     |
-| 3.1530     | 2.8074          | 6     |
-| 3.0035     | 2.7699          | 7     |
-| 2.8622     | 2.7444          | 8     |
-| 2.7423     | 2.7162          | 9     |
-| 2.6218     | 2.7089          | 10    |
-| 2.5106     | 2.6997          | 11    |
-| 2.4090     | 2.7081          | 12    |
-| 2.3063     | 2.7278          | 13    |
-| 2.2076     | 2.7389          | 14    |
-| 2.1084     | 2.7752          | 15    |
-| 2.0043     | 2.8056          | 16    |
-| 1.9061     | 2.8248          | 17    |
-| 1.8142     | 2.8616          | 18    |
-| 1.7280     | 2.9050          | 19    |
-| 1.6480     | 2.9312          | 20    |
-| 1.5664     | 3.0067          | 21    |
-| 1.4709     | 3.0329          | 22    |
-| 1.4106     | 3.0626          | 23    |
-| 1.3306     | 3.1512          | 24    |
-| 1.2525     | 3.1912          | 25    |
-| 1.1883     | 3.2798          | 26    |
-| 1.1302     | 3.3261          | 27    |
-| 1.0607     | 3.4132          | 28    |
-| 1.0138     | 3.4018          | 29    |
-| 0.9581     | 3.4898          | 30    |
-| 0.9053     | 3.6052          | 31    |
-| 0.8553     | 3.6480          | 32    |
-| 0.8045     | 3.7776          | 33    |
-| 0.7669     | 3.7579          | 34    |
-| 0.7209     | 3.7751          | 35    |
-| 0.6860     | 3.9205          | 36    |
-| 0.6473     | 4.0297          | 37    |
-| 0.6129     | 4.0663          | 38    |
-| 0.5853     | 4.0667          | 39    |
-| 0.5518     | 4.2401          | 40    |
-| 0.5205     | 4.2675          | 41    |
-| 0.4964     | 4.2551          | 42    |
-| 0.4765     | 4.3178          | 43    |
-| 0.4589     | 4.4624          | 44    |
-| 0.4319     | 4.4997          | 45    |
-| 0.4107     | 4.5586          | 46    |
-| 0.3886     | 4.6677          | 47    |
-| 0.3755     | 4.6753          | 48    |
-| 0.3536     | 4.7340          | 49    |
-| 0.3382     | 4.8393          | 50    |
-| 0.3225     | 4.7817          | 51    |
-| 0.3140     | 4.8783          | 52    |
-| 0.2949     | 4.9444          | 53    |
-| 0.2853     | 5.0210          | 54    |
-| 0.2739     | 4.9796          | 55    |
-| 0.2646     | 5.0427          | 56    |
-| 0.2492     | 5.0848          | 57    |
-| 0.2408     | 5.2522          | 58    |
-| 0.2334     | 5.2251          | 59    |
-| 0.2233     | 5.3535          | 60    |
-| 0.2110     | 5.3478          | 61    |
-| 0.2097     | 5.2551          | 62    |
-| 0.2003     | 5.3240          | 63    |
-| 0.1914     | 5.5138          | 64    |
-| 0.1863     | 5.4430          | 65    |
-| 0.1796     | 5.4543          | 66    |
-| 0.1755     | 5.5029          | 67    |
-| 0.1673     | 5.4727          | 68    |
-| 0.1587     | 5.5600          | 69    |
-| 0.1569     | 5.5672          | 70    |
-| 0.1508     | 5.7395          | 71    |
-| 0.1500     | 5.6063          | 72    |
 ### Framework versions
-- Transformers 4.29.2
 - TensorFlow 2.12.0
 - Datasets 2.12.0
 - Tokenizers 0.13.3

 This model is a fine-tuned version of [google/mt5-small](https://huggingface.co/google/mt5-small) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Train Loss: 5.6636
+- Validation Loss: 2.9818
+- Epoch: 0
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 0.0003, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 0.0003, 'decay_steps': 19900, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, '__passive_serialization__': True}, 'warmup_steps': 100, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.001}
 - training_precision: mixed_float16
 ### Training results
 | Train Loss | Validation Loss | Epoch |
 |:----------:|:---------------:|:-----:|
+| 5.6636     | 2.9818          | 0     |
 ### Framework versions
+- Transformers 4.30.2
 - TensorFlow 2.12.0
 - Datasets 2.12.0
 - Tokenizers 0.13.3

config.json CHANGED Viewed

@@ -28,7 +28,7 @@
   "relative_attention_num_buckets": 32,
   "tie_word_embeddings": false,
   "tokenizer_class": "T5Tokenizer",
-  "transformers_version": "4.29.2",
   "use_cache": true,
   "vocab_size": 250112
 }

   "relative_attention_num_buckets": 32,
   "tie_word_embeddings": false,
   "tokenizer_class": "T5Tokenizer",
+  "transformers_version": "4.30.2",
   "use_cache": true,
   "vocab_size": 250112
 }

generation_config.json CHANGED Viewed

@@ -3,5 +3,5 @@
   "decoder_start_token_id": 0,
   "eos_token_id": 1,
   "pad_token_id": 0,
-  "transformers_version": "4.29.2"
 }

   "decoder_start_token_id": 0,
   "eos_token_id": 1,
   "pad_token_id": 0,
+  "transformers_version": "4.30.2"
 }

tf_model.h5 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d33b726aa4bc201fe641c601cf53e2c3ea7f6f2f079af891380fade03fb062fe
 size 2225556280

 version https://git-lfs.github.com/spec/v1
+oid sha256:f10143b692b9945dbdae6886abbdad14c3d14b7b666ec50d742931a545f56610
 size 2225556280