kanishka commited on
Commit
7e8a02e
1 Parent(s): 4850b55

Model save

Browse files
README.md CHANGED
@@ -1,23 +1,11 @@
1
  ---
2
  tags:
3
  - generated_from_trainer
4
- datasets:
5
- - kanishka/counterfactual-babylm-pipps-random_removal
6
  metrics:
7
  - accuracy
8
  model-index:
9
  - name: smolm-autoreg-bpe-counterfactual-babylm-pipps-random_removal-seed_211-1e-3
10
- results:
11
- - task:
12
- name: Causal Language Modeling
13
- type: text-generation
14
- dataset:
15
- name: kanishka/counterfactual-babylm-pipps-random_removal
16
- type: kanishka/counterfactual-babylm-pipps-random_removal
17
- metrics:
18
- - name: Accuracy
19
- type: accuracy
20
- value: 0.40988659662430754
21
  ---
22
 
23
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -25,10 +13,10 @@ should probably proofread and complete it, then remove this comment. -->
25
 
26
  # smolm-autoreg-bpe-counterfactual-babylm-pipps-random_removal-seed_211-1e-3
27
 
28
- This model was trained from scratch on the kanishka/counterfactual-babylm-pipps-random_removal dataset.
29
  It achieves the following results on the evaluation set:
30
- - Loss: 3.4120
31
- - Accuracy: 0.4099
32
 
33
  ## Model description
34
 
@@ -50,7 +38,7 @@ The following hyperparameters were used during training:
50
  - learning_rate: 0.001
51
  - train_batch_size: 32
52
  - eval_batch_size: 64
53
- - seed: 211
54
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
  - lr_scheduler_type: linear
56
  - lr_scheduler_warmup_steps: 32000
@@ -61,26 +49,26 @@ The following hyperparameters were used during training:
61
 
62
  | Training Loss | Epoch | Step | Validation Loss | Accuracy |
63
  |:-------------:|:-----:|:------:|:---------------:|:--------:|
64
- | 3.6072 | 1.0 | 18592 | 3.7972 | 0.3577 |
65
- | 3.3863 | 2.0 | 37184 | 3.5804 | 0.3804 |
66
- | 3.2556 | 3.0 | 55776 | 3.4730 | 0.3910 |
67
- | 3.1829 | 4.0 | 74368 | 3.4019 | 0.3992 |
68
- | 3.1264 | 5.0 | 92960 | 3.3828 | 0.4020 |
69
- | 3.0827 | 6.0 | 111552 | 3.3849 | 0.4031 |
70
- | 3.0461 | 7.0 | 130144 | 3.3728 | 0.4050 |
71
- | 3.0111 | 8.0 | 148736 | 3.3609 | 0.4069 |
72
- | 2.9857 | 9.0 | 167328 | 3.3496 | 0.4082 |
73
- | 2.9608 | 10.0 | 185920 | 3.3683 | 0.4075 |
74
- | 2.9402 | 11.0 | 204512 | 3.3728 | 0.4086 |
75
- | 2.9154 | 12.0 | 223104 | 3.3845 | 0.4083 |
76
- | 2.891 | 13.0 | 241696 | 3.3741 | 0.4098 |
77
- | 2.8754 | 14.0 | 260288 | 3.3674 | 0.4106 |
78
- | 2.8555 | 15.0 | 278880 | 3.3868 | 0.4095 |
79
- | 2.8368 | 16.0 | 297472 | 3.3892 | 0.4098 |
80
- | 2.8185 | 17.0 | 316064 | 3.3865 | 0.4106 |
81
- | 2.7969 | 18.0 | 334656 | 3.4006 | 0.4099 |
82
- | 2.7805 | 19.0 | 353248 | 3.3997 | 0.4104 |
83
- | 2.7623 | 20.0 | 371840 | 3.4120 | 0.4099 |
84
 
85
 
86
  ### Framework versions
 
1
  ---
2
  tags:
3
  - generated_from_trainer
 
 
4
  metrics:
5
  - accuracy
6
  model-index:
7
  - name: smolm-autoreg-bpe-counterfactual-babylm-pipps-random_removal-seed_211-1e-3
8
+ results: []
 
 
 
 
 
 
 
 
 
 
9
  ---
10
 
11
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
13
 
14
  # smolm-autoreg-bpe-counterfactual-babylm-pipps-random_removal-seed_211-1e-3
15
 
16
+ This model was trained from scratch on an unknown dataset.
17
  It achieves the following results on the evaluation set:
18
+ - Loss: 3.3976
19
+ - Accuracy: 0.4113
20
 
21
  ## Model description
22
 
 
38
  - learning_rate: 0.001
39
  - train_batch_size: 32
40
  - eval_batch_size: 64
41
+ - seed: 1024
42
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
43
  - lr_scheduler_type: linear
44
  - lr_scheduler_warmup_steps: 32000
 
49
 
50
  | Training Loss | Epoch | Step | Validation Loss | Accuracy |
51
  |:-------------:|:-----:|:------:|:---------------:|:--------:|
52
+ | 3.6046 | 1.0 | 18592 | 3.7833 | 0.3594 |
53
+ | 3.3837 | 2.0 | 37184 | 3.5854 | 0.3805 |
54
+ | 3.26 | 3.0 | 55776 | 3.4488 | 0.3931 |
55
+ | 3.1824 | 4.0 | 74368 | 3.4153 | 0.3986 |
56
+ | 3.1238 | 5.0 | 92960 | 3.3853 | 0.4028 |
57
+ | 3.0837 | 6.0 | 111552 | 3.3512 | 0.4060 |
58
+ | 3.0442 | 7.0 | 130144 | 3.3564 | 0.4065 |
59
+ | 3.0168 | 8.0 | 148736 | 3.3438 | 0.4083 |
60
+ | 2.9792 | 9.0 | 167328 | 3.3495 | 0.4090 |
61
+ | 2.9607 | 10.0 | 185920 | 3.3579 | 0.4091 |
62
+ | 2.9363 | 11.0 | 204512 | 3.3420 | 0.4116 |
63
+ | 2.9148 | 12.0 | 223104 | 3.3631 | 0.4106 |
64
+ | 2.893 | 13.0 | 241696 | 3.3609 | 0.4106 |
65
+ | 2.8729 | 14.0 | 260288 | 3.3806 | 0.4101 |
66
+ | 2.8543 | 15.0 | 278880 | 3.3685 | 0.4112 |
67
+ | 2.8352 | 16.0 | 297472 | 3.3734 | 0.4119 |
68
+ | 2.8131 | 17.0 | 316064 | 3.3759 | 0.4115 |
69
+ | 2.7949 | 18.0 | 334656 | 3.3842 | 0.4111 |
70
+ | 2.7756 | 19.0 | 353248 | 3.3893 | 0.4115 |
71
+ | 2.7607 | 20.0 | 371840 | 3.3976 | 0.4113 |
72
 
73
 
74
  ### Framework versions
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8f9c25cb97da379cb76332438fc643194f23bbd2d209c88169a7b19ad8b3ac25
3
  size 391376736
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:35bd5e90710c1fbb0d52eb18e22fed29bd1cae09cdaa3848881769d6821a1244
3
  size 391376736
runs/Feb18_22-02-10_phyl-ling-p01.la.utexas.edu/events.out.tfevents.1708315684.phyl-ling-p01.la.utexas.edu.4148918.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c3a28161ce7ad2b82731c41fd869d1f3f2542f5cfc6f9edd8156f72c732a846e
3
- size 70206
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b37b38de5b7d3a1e68a0d1114804426861f461d47efe270213ae4e9b27d95d4d
3
+ size 71055