metadata

license: apache-2.0
base_model: distilroberta-base
tags:
  - generated_from_trainer
model-index:
  - name: distilroberta-base-fineweb-edu-llama3-annotations-2048-vN
    results: []

distilroberta-base-fineweb-edu-llama3-annotations-2048-vN

This model is a fine-tuned version of distilroberta-base on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.2197
Mse: 0.2197

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 16
eval_batch_size: 16
seed: 90085
gradient_accumulation_steps: 8
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-09
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1.0

Training results

Training Loss	Epoch	Step	Validation Loss	Mse
0.5276	0.0288	100	0.5012	0.5012
0.3307	0.0576	200	0.3467	0.3467
0.2994	0.0865	300	0.2948	0.2948
0.2813	0.1153	400	0.2799	0.2799
0.2707	0.1441	500	0.3017	0.3017
0.2506	0.1729	600	0.2699	0.2699
0.2584	0.2018	700	0.2633	0.2633
0.2603	0.2306	800	0.2434	0.2434
0.2973	0.2594	900	0.2394	0.2394
0.2541	0.2882	1000	0.2356	0.2356
0.2837	0.3171	1100	0.2437	0.2437
0.242	0.3459	1200	0.2379	0.2379
0.2379	0.3747	1300	0.2270	0.2270
0.23	0.4035	1400	0.2357	0.2357
0.2345	0.4324	1500	0.2417	0.2417
0.2574	0.4612	1600	0.2556	0.2556
0.264	0.4900	1700	0.2452	0.2452
0.2596	0.5188	1800	0.2215	0.2215
0.244	0.5477	1900	0.2269	0.2269
0.2225	0.5765	2000	0.2342	0.2342
0.2475	0.6053	2100	0.2403	0.2403
0.253	0.6341	2200	0.2326	0.2326
0.2435	0.6630	2300	0.2161	0.2161
0.2865	0.6918	2400	0.2265	0.2265
0.2351	0.7206	2500	0.2343	0.2343
0.2582	0.7494	2600	0.2342	0.2342
0.2167	0.7783	2700	0.2337	0.2337
0.2495	0.8071	2800	0.2273	0.2273
0.2364	0.8359	2900	0.2298	0.2298
0.2236	0.8647	3000	0.2170	0.2170
0.231	0.8936	3100	0.2234	0.2234
0.2474	0.9224	3200	0.2227	0.2227
0.2333	0.9512	3300	0.2241	0.2241
0.2265	0.9800	3400	0.2197	0.2197

Framework versions

Transformers 4.42.3
Pytorch 2.3.1+cu121
Datasets 2.20.0
Tokenizers 0.19.1