IParraMartin
/

impossible-llms-english-random-trigram

+---
+library_name: transformers
+tags:
+- generated_from_trainer
+model-index:
+- name: impossible-llms-english-random-trigram
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# impossible-llms-english-random-trigram
+This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 4.3113
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0001
+- train_batch_size: 12
+- eval_batch_size: 8
+- seed: 0
+- distributed_type: multi-GPU
+- num_devices: 4
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 384
+- total_eval_batch_size: 32
+- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.1
+- training_steps: 3000
+- mixed_precision_training: Native AMP
+- label_smoothing_factor: 0.1
+### Training results
+| Training Loss | Epoch   | Step | Validation Loss |
+|:-------------:|:-------:|:----:|:---------------:|
+| 14.1482       | 1.0     | 96   | 6.9646          |
+| 11.4328       | 2.0     | 192  | 5.6992          |
+| 11.1488       | 3.0     | 288  | 5.5112          |
+| 10.5646       | 4.0     | 384  | 5.2485          |
+| 10.2163       | 5.0     | 480  | 5.0376          |
+| 9.8751        | 6.0     | 576  | 4.8854          |
+| 9.6552        | 7.0     | 672  | 4.7683          |
+| 9.4312        | 8.0     | 768  | 4.6836          |
+| 9.301         | 9.0     | 864  | 4.6148          |
+| 9.2448        | 10.0    | 960  | 4.5597          |
+| 9.1271        | 11.0    | 1056 | 4.5156          |
+| 9.0854        | 12.0    | 1152 | 4.4794          |
+| 8.9255        | 13.0    | 1248 | 4.4493          |
+| 8.8784        | 14.0    | 1344 | 4.4255          |
+| 8.7833        | 15.0    | 1440 | 4.4035          |
+| 8.6755        | 16.0    | 1536 | 4.3862          |
+| 8.6895        | 17.0    | 1632 | 4.3722          |
+| 8.6269        | 18.0    | 1728 | 4.3582          |
+| 8.5067        | 19.0    | 1824 | 4.3492          |
+| 8.4444        | 20.0    | 1920 | 4.3404          |
+| 8.5608        | 21.0    | 2016 | 4.3332          |
+| 8.4592        | 22.0    | 2112 | 4.3274          |
+| 8.4261        | 23.0    | 2208 | 4.3233          |
+| 8.471         | 24.0    | 2304 | 4.3193          |
+| 8.3813        | 25.0    | 2400 | 4.3163          |
+| 8.3404        | 26.0    | 2496 | 4.3149          |
+| 8.3891        | 27.0    | 2592 | 4.3132          |
+| 8.3628        | 28.0    | 2688 | 4.3122          |
+| 8.4306        | 29.0    | 2784 | 4.3117          |
+| 8.2589        | 30.0    | 2880 | 4.3113          |
+| 8.247         | 31.0    | 2976 | 4.3113          |
+| 33.3577       | 31.2520 | 3000 | 4.3113          |
+### Framework versions
+- Transformers 4.49.0
+- Pytorch 2.4.0+cu121
+- Datasets 3.4.0
+- Tokenizers 0.21.0

generation_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 0,
+  "eos_token_id": 0,
+  "transformers_version": "4.49.0"
+}