rvv-karma commited on
Commit
e1b812b
1 Parent(s): 721f212

Model save

Browse files
Files changed (3) hide show
  1. README.md +90 -0
  2. generation_config.json +8 -0
  3. model.safetensors +1 -1
README.md ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: google/flan-t5-base
4
+ tags:
5
+ - generated_from_trainer
6
+ datasets:
7
+ - tldr
8
+ metrics:
9
+ - rouge
10
+ model-index:
11
+ - name: BASH-Coder-Flan-T5-base
12
+ results:
13
+ - task:
14
+ name: Sequence-to-sequence Language Modeling
15
+ type: text2text-generation
16
+ dataset:
17
+ name: tldr
18
+ type: tldr
19
+ config: data
20
+ split: validation
21
+ args: data
22
+ metrics:
23
+ - name: Rouge1
24
+ type: rouge
25
+ value: 27.0741
26
+ ---
27
+
28
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
29
+ should probably proofread and complete it, then remove this comment. -->
30
+
31
+ # BASH-Coder-Flan-T5-base
32
+
33
+ This model is a fine-tuned version of [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) on the tldr dataset.
34
+ It achieves the following results on the evaluation set:
35
+ - Loss: 3.3608
36
+ - Rouge1: 27.0741
37
+ - Rouge2: 9.3824
38
+ - Rougel: 26.133
39
+ - Rougelsum: 26.1559
40
+ - Gen Len: 15.5767
41
+
42
+ ## Model description
43
+
44
+ More information needed
45
+
46
+ ## Intended uses & limitations
47
+
48
+ More information needed
49
+
50
+ ## Training and evaluation data
51
+
52
+ More information needed
53
+
54
+ ## Training procedure
55
+
56
+ ### Training hyperparameters
57
+
58
+ The following hyperparameters were used during training:
59
+ - learning_rate: 5e-05
60
+ - train_batch_size: 8
61
+ - eval_batch_size: 16
62
+ - seed: 42
63
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
64
+ - lr_scheduler_type: linear
65
+ - lr_scheduler_warmup_steps: 100
66
+ - num_epochs: 10
67
+ - label_smoothing_factor: 0.1
68
+
69
+ ### Training results
70
+
71
+ | Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
72
+ |:-------------:|:-----:|:----:|:---------------:|:-------:|:------:|:-------:|:---------:|:-------:|
73
+ | 4.3554 | 1.0 | 802 | 3.5928 | 22.7234 | 6.7951 | 22.0647 | 22.0744 | 15.2363 |
74
+ | 3.5335 | 2.0 | 1604 | 3.4654 | 25.7842 | 8.5847 | 24.8207 | 24.8808 | 15.168 |
75
+ | 3.3341 | 3.0 | 2406 | 3.4078 | 25.5756 | 8.4456 | 24.706 | 24.7207 | 15.6472 |
76
+ | 3.2011 | 4.0 | 3208 | 3.3789 | 26.0638 | 8.6853 | 25.0862 | 25.1223 | 16.2748 |
77
+ | 3.1059 | 5.0 | 4010 | 3.3622 | 26.7254 | 9.1138 | 25.7985 | 25.8521 | 15.7366 |
78
+ | 3.0336 | 6.0 | 4812 | 3.3662 | 26.4655 | 9.1283 | 25.4587 | 25.5112 | 16.548 |
79
+ | 2.9727 | 7.0 | 5614 | 3.3593 | 26.8211 | 9.3045 | 25.8497 | 25.8772 | 15.5431 |
80
+ | 2.9298 | 8.0 | 6416 | 3.3643 | 26.8932 | 9.3537 | 25.9444 | 26.0088 | 15.916 |
81
+ | 2.9005 | 9.0 | 7218 | 3.3606 | 27.1732 | 9.5661 | 26.1198 | 26.1515 | 15.71 |
82
+ | 2.8846 | 10.0 | 8020 | 3.3608 | 27.0741 | 9.3824 | 26.133 | 26.1559 | 15.5767 |
83
+
84
+
85
+ ### Framework versions
86
+
87
+ - Transformers 4.37.0.dev0
88
+ - Pytorch 2.1.0+cu121
89
+ - Datasets 2.15.0
90
+ - Tokenizers 0.15.0
generation_config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 2,
3
+ "decoder_start_token_id": 2,
4
+ "eos_token_id": 1,
5
+ "max_length": 256,
6
+ "pad_token_id": 0,
7
+ "transformers_version": "4.37.0.dev0"
8
+ }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:cc04afbd53d6cc05b5cfd4b6dc1c951c62fd13df24a81edfcf70d08a0e78d3ea
3
  size 990345064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e308dfd3e63c1101832f6d3bb8ecc9df567333608538bb2d95d7f8303104164b
3
  size 990345064