File size: 4,272 Bytes
895ec39
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6ec7ec2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
1289.4068 seconds used for training.
21.49 minutes used for training.
Peak reserved memory = 9.545 GB.
Peak reserved memory for training = 4.018 GB.
Peak reserved memory % of max memory = 43.058 %.
Peak reserved memory for training % of max memory = 18.125 %.

args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 10,  # Augmenté le nombre de steps de warmup
        max_steps = 200,    # Augmenté le nombre total de steps
        learning_rate = 1e-4,  # Réduit le taux d'apprentissage
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 42,
        output_dir = "outputs",
        

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 399 | Num Epochs = 4
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 200
 "-____-"     Number of trainable parameters = 20,971,520
 [200/200 21:17, Epoch 4/4]
Step	Training Loss
1	2.027900
2	2.008700
3	1.946100
4	1.924700
5	1.995000
6	1.999000
7	1.870100
8	1.891400
9	1.807600
10	1.723200
11	1.665100
12	1.541000
13	1.509100
14	1.416600
15	1.398600
16	1.233200
17	1.172100
18	1.272100
19	1.146000
20	1.179000
21	1.206400
22	1.095400
23	0.937300
24	1.214300
25	1.040200
26	1.183400
27	1.033900
28	0.953100
29	0.935700
30	0.962200
31	0.908900
32	0.924900
33	0.931000
34	1.011300
35	0.951900
36	0.936000
37	0.903000
38	0.906900
39	0.945700
40	0.827000
41	0.931800
42	0.919600
43	0.926900
44	0.932900
45	0.872700
46	0.795200
47	0.888700
48	0.956800
49	1.004200
50	0.859500
51	0.802500
52	0.855400
53	0.885500
54	1.026600
55	0.844100
56	0.879800
57	0.797400
58	0.885300
59	0.842800
60	0.861600
61	0.789100
62	0.861600
63	0.856700
64	0.929200
65	0.782500
66	0.713600
67	0.781000
68	0.765100
69	0.784700
70	0.869500
71	0.742900
72	0.787900
73	0.750800
74	0.931700
75	0.713000
76	0.832100
77	0.928300
78	0.777600
79	0.694000
80	0.835400
81	0.822000
82	0.754600
83	0.813400
84	0.868800
85	0.732400
86	0.803700
87	0.694400
88	0.771300
89	0.864400
90	0.646700
91	0.690800
92	0.695000
93	0.732300
94	0.766900
95	0.864100
96	0.867200
97	0.774300
98	0.797700
99	0.772100
100	0.906700
101	0.693400
102	0.685500
103	0.712200
104	0.678400
105	0.761900
106	0.705300
107	0.775700
108	0.627600
109	0.599300
110	0.615100
111	0.618200
112	0.668700
113	0.699900
114	0.577000
115	0.711600
116	0.692900
117	0.585400
118	0.646400
119	0.569200
120	0.752300
121	0.745000
122	0.690100
123	0.744700
124	0.665800
125	0.866100
126	0.707400
127	0.679300
128	0.591400
129	0.655100
130	0.734000
131	0.637900
132	0.733900
133	0.652500
134	0.685400
135	0.641300
136	0.608200
137	0.754100
138	0.753700
139	0.671000
140	0.767200
141	0.668700
142	0.630300
143	0.734700
144	0.767700
145	0.722200
146	0.694400
147	0.710100
148	0.696300
149	0.612600
150	0.670400
151	0.512900
152	0.675100
153	0.579900
154	0.622900
155	0.652500
156	0.649200
157	0.546700
158	0.521600
159	0.522200
160	0.589400
161	0.552600
162	0.630700
163	0.595600
164	0.614300
165	0.489400
166	0.634500
167	0.620800
168	0.618600
169	0.637900
170	0.553900
171	0.656000
172	0.644000
173	0.694300
174	0.608900
175	0.673000
176	0.612500
177	0.654200
178	0.639200
179	0.599100
180	0.642100
181	0.529700
182	0.614000
183	0.582900
184	0.765100
185	0.502700
186	0.564300
187	0.740200
188	0.636100
189	0.638800
190	0.560100
191	0.620000
192	0.712800
193	0.531000
194	0.591600
195	0.608600
196	0.671800
197	0.572900
198	0.600900
199	0.586800
200	0.545900

---
base_model: unsloth/llama-3-8b-bnb-4bit
language:
- en
license: apache-2.0
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- gguf
---

# Uploaded  model

- **Developed by:** Mathoufle13
- **License:** apache-2.0
- **Finetuned from model :** unsloth/llama-3-8b-bnb-4bit

This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)