lukasdrg commited on
Commit
978b535
1 Parent(s): 700efea

End of training

Browse files
Files changed (4) hide show
  1. README.md +15 -6
  2. pytorch_model.bin +1 -1
  3. tokenizer.json +16 -2
  4. training_args.bin +1 -1
README.md CHANGED
@@ -15,7 +15,7 @@ should probably proofread and complete it, then remove this comment. -->
15
 
16
  This model is a fine-tuned version of [allenai/longformer-base-4096](https://huggingface.co/allenai/longformer-base-4096) on the None dataset.
17
  It achieves the following results on the evaluation set:
18
- - Loss: 3.8141
19
 
20
  ## Model description
21
 
@@ -36,10 +36,10 @@ More information needed
36
  The following hyperparameters were used during training:
37
  - learning_rate: 2e-05
38
  - train_batch_size: 2
39
- - eval_batch_size: 16
40
  - seed: 42
41
- - gradient_accumulation_steps: 64
42
- - total_train_batch_size: 128
43
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
44
  - lr_scheduler_type: linear
45
  - lr_scheduler_warmup_steps: 1500
@@ -49,8 +49,17 @@ The following hyperparameters were used during training:
49
 
50
  | Training Loss | Epoch | Step | Validation Loss |
51
  |:-------------:|:-----:|:----:|:---------------:|
52
- | 1.8515 | 1.42 | 2 | 4.1959 |
53
- | 1.9816 | 2.84 | 4 | 3.8141 |
 
 
 
 
 
 
 
 
 
54
 
55
 
56
  ### Framework versions
 
15
 
16
  This model is a fine-tuned version of [allenai/longformer-base-4096](https://huggingface.co/allenai/longformer-base-4096) on the None dataset.
17
  It achieves the following results on the evaluation set:
18
+ - Loss: 20.5838
19
 
20
  ## Model description
21
 
 
36
  The following hyperparameters were used during training:
37
  - learning_rate: 2e-05
38
  - train_batch_size: 2
39
+ - eval_batch_size: 4
40
  - seed: 42
41
+ - gradient_accumulation_steps: 4
42
+ - total_train_batch_size: 8
43
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
44
  - lr_scheduler_type: linear
45
  - lr_scheduler_warmup_steps: 1500
 
49
 
50
  | Training Loss | Epoch | Step | Validation Loss |
51
  |:-------------:|:-----:|:----:|:---------------:|
52
+ | 21.7989 | 0.36 | 4 | 22.5208 |
53
+ | 20.1605 | 0.71 | 8 | 22.5098 |
54
+ | 21.2348 | 1.07 | 12 | 22.6825 |
55
+ | 21.615 | 1.42 | 16 | 22.2299 |
56
+ | 20.4123 | 1.78 | 20 | 22.0543 |
57
+ | 20.8174 | 2.13 | 24 | 22.1491 |
58
+ | 20.7756 | 2.49 | 28 | 21.9382 |
59
+ | 19.8654 | 2.84 | 32 | 21.6735 |
60
+ | 20.6743 | 3.2 | 36 | 21.3893 |
61
+ | 20.3151 | 3.56 | 40 | 21.1435 |
62
+ | 19.6614 | 3.91 | 44 | 20.5838 |
63
 
64
 
65
  ### Framework versions
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:776994b40ac37027e65ee158ca40c781444730396f82ac9339f3f0f344301f77
3
  size 594941266
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:33d7a6ab97ae94c2ab33ae3d3ffdaf065ba7e61d85b6eb52a45f99be35e9f77b
3
  size 594941266
tokenizer.json CHANGED
@@ -1,7 +1,21 @@
1
  {
2
  "version": "1.0",
3
- "truncation": null,
4
- "padding": null,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  "added_tokens": [
6
  {
7
  "id": 0,
 
1
  {
2
  "version": "1.0",
3
+ "truncation": {
4
+ "direction": "Right",
5
+ "max_length": 1028,
6
+ "strategy": "LongestFirst",
7
+ "stride": 0
8
+ },
9
+ "padding": {
10
+ "strategy": {
11
+ "Fixed": 1028
12
+ },
13
+ "direction": "Right",
14
+ "pad_to_multiple_of": null,
15
+ "pad_id": 1,
16
+ "pad_type_id": 0,
17
+ "pad_token": "<pad>"
18
+ },
19
  "added_tokens": [
20
  {
21
  "id": 0,
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:afb0ffc0dc18e0133082b740d841c21a8318df481825164d2f9a7d761d17ae78
3
  size 4536
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:49aba76b9e4e90c8c3669d6397a822e705d137da816c65797e3b29b6eeb24a83
3
  size 4536