tcapelle
/

toxicity-scorer-smollm2-135m-it-freeze

@@ -4,11 +4,6 @@ license: apache-2.0
 base_model: HuggingFaceTB/SmolLM2-135M-Instruct
 tags:
 - generated_from_trainer
-metrics:
-- f1
-- accuracy
-- precision
-- recall
 model-index:
 - name: toxicity-scorer-smollm2-135m-it-freeze
   results: []
@@ -21,11 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [HuggingFaceTB/SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.3147
-- F1: 0.8264
-- Accuracy: 0.8745
-- Precision: 0.8384
-- Recall: 0.8745
 ## Model description
@@ -45,13 +36,9 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 3e-05
-- train_batch_size: 44
-- eval_batch_size: 44
 - seed: 42
-- distributed_type: multi-GPU
-- num_devices: 8
-- total_train_batch_size: 352
-- total_eval_batch_size: 352
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.1
@@ -59,15 +46,17 @@ The following hyperparameters were used during training:
 ### Training results
-| Training Loss | Epoch  | Step | Validation Loss | F1     | Accuracy | Precision | Recall |
-|:-------------:|:------:|:----:|:---------------:|:------:|:--------:|:---------:|:------:|
-| No log        | 0      | 0    | 0.9227          | 0.6386 | 0.5685   | 0.7480    | 0.5685 |
-| 0.3196        | 1.5596 | 5000 | 0.3147          | 0.8264 | 0.8745   | 0.8384    | 0.8745 |
 ### Framework versions
 - Transformers 4.46.3
-- Pytorch 2.5.1
 - Datasets 3.1.0
 - Tokenizers 0.20.3

 base_model: HuggingFaceTB/SmolLM2-135M-Instruct
 tags:
 - generated_from_trainer
 model-index:
 - name: toxicity-scorer-smollm2-135m-it-freeze
   results: []
 This model is a fine-tuned version of [HuggingFaceTB/SmolLM2-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 3.7842
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 3e-05
+- train_batch_size: 16
+- eval_batch_size: 16
 - seed: 42
 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.1
 ### Training results
+| Training Loss | Epoch | Step | Validation Loss |
+|:-------------:|:-----:|:----:|:---------------:|
+| No log        | 0     | 0    | 3.8389          |
+| 3.8172        | 1.0   | 63   | 3.8037          |
+| 3.7957        | 2.0   | 126  | 3.7857          |
+| 3.7137        | 3.0   | 189  | 3.7842          |
 ### Framework versions
 - Transformers 4.46.3
+- Pytorch 2.5.1+cu124
 - Datasets 3.1.0
 - Tokenizers 0.20.3

generation_config.json CHANGED Viewed

@@ -1,6 +1,7 @@
 {
   "_from_model_config": true,
-  "bos_token_id": 0,
-  "eos_token_id": 0,
   "transformers_version": "4.46.3"
 }

 {
   "_from_model_config": true,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "pad_token_id": 2,
   "transformers_version": "4.46.3"
 }