floriangardin
/

model

@@ -1,4 +1,5 @@
 ---
 tags:
 - generated_from_trainer
 model-index:
@@ -11,9 +12,9 @@ should probably proofread and complete it, then remove this comment. -->
 # model
-This model is a fine-tuned version of [floriangardin/musiclang_medium](https://huggingface.co/floriangardin/musiclang_medium) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.5640
 ## Model description
@@ -32,64 +33,55 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 5e-05
-- train_batch_size: 64
-- eval_batch_size: 64
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_steps: 15
-- num_epochs: 6
 ### Training results
 | Training Loss | Epoch | Step  | Validation Loss |
 |:-------------:|:-----:|:-----:|:---------------:|
-| No log        | 0.15  | 400   | 0.6429          |
-| 0.6819        | 0.3   | 800   | 0.6389          |
-| 0.6744        | 0.44  | 1200  | 0.6335          |
-| 0.6664        | 0.59  | 1600  | 0.6319          |
-| 0.659         | 0.74  | 2000  | 0.6246          |
-| 0.659         | 0.89  | 2400  | 0.6203          |
-| 0.6519        | 1.04  | 2800  | 0.6178          |
-| 0.6446        | 1.19  | 3200  | 0.6136          |
-| 0.6403        | 1.33  | 3600  | 0.6103          |
-| 0.6363        | 1.48  | 4000  | 0.6052          |
-| 0.6363        | 1.63  | 4400  | 0.6051          |
-| 0.6302        | 1.78  | 4800  | 0.6011          |
-| 0.6257        | 1.93  | 5200  | 0.5985          |
-| 0.6229        | 2.07  | 5600  | 0.5971          |
-| 0.6185        | 2.22  | 6000  | 0.5948          |
-| 0.6185        | 2.37  | 6400  | 0.5938          |
-| 0.6155        | 2.52  | 6800  | 0.5911          |
-| 0.6123        | 2.67  | 7200  | 0.5883          |
-| 0.6096        | 2.82  | 7600  | 0.5867          |
-| 0.6079        | 2.96  | 8000  | 0.5856          |
-| 0.6079        | 3.11  | 8400  | 0.5835          |
-| 0.6026        | 3.26  | 8800  | 0.5814          |
-| 0.5998        | 3.41  | 9200  | 0.5804          |
-| 0.5993        | 3.56  | 9600  | 0.5779          |
-| 0.5978        | 3.71  | 10000 | 0.5770          |
-| 0.5978        | 3.85  | 10400 | 0.5761          |
-| 0.5958        | 4.0   | 10800 | 0.5746          |
-| 0.5937        | 4.15  | 11200 | 0.5737          |
-| 0.5909        | 4.3   | 11600 | 0.5733          |
-| 0.5884        | 4.45  | 12000 | 0.5714          |
-| 0.5884        | 4.59  | 12400 | 0.5704          |
-| 0.588         | 4.74  | 12800 | 0.5690          |
-| 0.5875        | 4.89  | 13200 | 0.5685          |
-| 0.5848        | 5.04  | 13600 | 0.5679          |
-| 0.5827        | 5.19  | 14000 | 0.5668          |
-| 0.5827        | 5.34  | 14400 | 0.5663          |
-| 0.5839        | 5.48  | 14800 | 0.5658          |
-| 0.5806        | 5.63  | 15200 | 0.5650          |
-| 0.5803        | 5.78  | 15600 | 0.5644          |
-| 0.5796        | 5.93  | 16000 | 0.5640          |
 ### Framework versions
-- Transformers 4.29.2
-- Pytorch 2.0.1+cu118
-- Datasets 2.12.0
-- Tokenizers 0.13.3

 ---
+base_model: musiclang/musiclang-v2-xl
 tags:
 - generated_from_trainer
 model-index:
 # model
+This model is a fine-tuned version of [musiclang/musiclang-v2-xl](https://huggingface.co/musiclang/musiclang-v2-xl) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.2930
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 0.0001
+- train_batch_size: 24
+- eval_batch_size: 24
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine_with_restarts
+- lr_scheduler_warmup_steps: 500
+- training_steps: 0
+- mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch | Step  | Validation Loss |
 |:-------------:|:-----:|:-----:|:---------------:|
+| 0.4098        | 0.03  | 2000  | 0.3828          |
+| 0.3901        | 0.07  | 4000  | 0.3684          |
+| 0.3737        | 0.1   | 6000  | 0.3569          |
+| 0.3652        | 0.13  | 8000  | 0.3489          |
+| 0.356         | 0.17  | 10000 | 0.3393          |
+| 0.35          | 0.2   | 12000 | 0.3342          |
+| 0.3443        | 0.23  | 14000 | 0.3282          |
+| 0.3371        | 0.26  | 16000 | 0.3235          |
+| 0.3361        | 0.3   | 18000 | 0.3201          |
+| 0.3301        | 0.33  | 20000 | 0.3160          |
+| 0.3253        | 0.36  | 22000 | 0.3131          |
+| 0.327         | 0.4   | 24000 | 0.3109          |
+| 0.3225        | 0.43  | 26000 | 0.3089          |
+| 0.3156        | 0.46  | 28000 | 0.3066          |
+| 0.3147        | 0.5   | 30000 | 0.3045          |
+| 0.3182        | 0.53  | 32000 | 0.3026          |
+| 0.3129        | 0.56  | 34000 | 0.3017          |
+| 0.3132        | 0.59  | 36000 | 0.3008          |
+| 0.3109        | 0.63  | 38000 | 0.2987          |
+| 0.3092        | 0.66  | 40000 | 0.2972          |
+| 0.3091        | 0.69  | 42000 | 0.2963          |
+| 0.3034        | 0.73  | 44000 | 0.2960          |
+| 0.3061        | 0.76  | 46000 | 0.2956          |
+| 0.3044        | 0.79  | 48000 | 0.2946          |
+| 0.3036        | 0.83  | 50000 | 0.2940          |
+| 0.3003        | 0.86  | 52000 | 0.2939          |
+| 0.303         | 0.89  | 54000 | 0.2934          |
+| 0.3007        | 0.93  | 56000 | 0.2932          |
+| 0.3009        | 0.96  | 58000 | 0.2930          |
+| 0.3           | 0.99  | 60000 | 0.2930          |
 ### Framework versions
+- Transformers 4.37.2
+- Pytorch 2.2.0+cu121
+- Datasets 2.17.0
+- Tokenizers 0.15.1

config.json CHANGED Viewed

@@ -1,21 +1,22 @@
 {
-  "_name_or_path": "floriangardin/musiclang_medium",
   "activation_function": "gelu_new",
   "architectures": [
     "GPT2LMHeadModel"
   ],
   "attn_pdrop": 0.1,
-  "bos_token_id": 79,
   "embd_pdrop": 0.1,
-  "eos_token_id": 79,
   "initializer_range": 0.02,
   "layer_norm_epsilon": 1e-05,
   "model_type": "gpt2",
   "n_embd": 300,
-  "n_head": 12,
   "n_inner": null,
-  "n_layer": 12,
-  "n_positions": 1024,
   "reorder_and_upcast_attn": false,
   "resid_pdrop": 0.1,
   "scale_attn_by_inverse_layer_idx": false,
@@ -26,7 +27,7 @@
   "summary_type": "cls_index",
   "summary_use_proj": true,
   "torch_dtype": "float32",
-  "transformers_version": "4.29.2",
   "use_cache": true,
-  "vocab_size": 281
 }

 {
+  "_name_or_path": "musiclang/musiclang-v2-xl",
   "activation_function": "gelu_new",
   "architectures": [
     "GPT2LMHeadModel"
   ],
   "attn_pdrop": 0.1,
+  "bos_token_id": 5,
   "embd_pdrop": 0.1,
+  "eos_token_id": 374,
   "initializer_range": 0.02,
   "layer_norm_epsilon": 1e-05,
   "model_type": "gpt2",
   "n_embd": 300,
+  "n_head": 10,
   "n_inner": null,
+  "n_layer": 10,
+  "n_positions": 4096,
+  "padding_token_id": 3,
   "reorder_and_upcast_attn": false,
   "resid_pdrop": 0.1,
   "scale_attn_by_inverse_layer_idx": false,
   "summary_type": "cls_index",
   "summary_use_proj": true,
   "torch_dtype": "float32",
+  "transformers_version": "4.37.2",
   "use_cache": true,
+  "vocab_size": 374
 }

generation_config.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "_from_model_config": true,
-  "bos_token_id": 79,
-  "eos_token_id": 79,
-  "transformers_version": "4.29.2"
 }

 {
   "_from_model_config": true,
+  "bos_token_id": 5,
+  "eos_token_id": 374,
+  "transformers_version": "4.37.2"
 }

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0edaec61be7b684821241f0871d5e8a7b48d865f5ad1aee2495c9dc9f54290a8
+size 48734696

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0df02bc9ca652060ebcb1749fa422f319a2ec8cc541de2f5763933254d03693d
-size 3899

 version https://git-lfs.github.com/spec/v1
+oid sha256:ded85577c88c0610c6cc2a534d279f3cac2c97cc898df360f85bd6ae1a843d78
+size 4664