pszemraj
/

distilbert-base-uncased-edu-classifier

Text Classification

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

distilbert-base-uncased-edu-classifier / README.md

pszemraj's picture

End of training

6d05ae2 verified 8 days ago

|

No virus

3.8 kB

	---
	license: apache-2.0
	base_model: distilbert-base-uncased
	tags:
	- generated_from_trainer
	model-index:
	- name: distilbert-base-uncased-fineweb-edu-llama3-annotations-512-vN
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/pszemraj/eduscore-regression/runs/k6z0kenz)
	# distilbert-base-uncased-fineweb-edu-llama3-annotations-512-vN

	This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the HuggingFaceFW/fineweb-edu-llama3-annotations dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.2324
	- Mse: 0.2324

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 16
	- eval_batch_size: 16
	- seed: 90085
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 128
	- optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-09
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.05
	- num_epochs: 1.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Mse \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:------:\|
	\| 0.5361 \| 0.0288 \| 100 \| 0.4934 \| 0.4934 \|
	\| 0.3483 \| 0.0576 \| 200 \| 0.3525 \| 0.3525 \|
	\| 0.3238 \| 0.0865 \| 300 \| 0.2931 \| 0.2931 \|
	\| 0.2734 \| 0.1153 \| 400 \| 0.3130 \| 0.3130 \|
	\| 0.2891 \| 0.1441 \| 500 \| 0.3298 \| 0.3298 \|
	\| 0.2807 \| 0.1729 \| 600 \| 0.2659 \| 0.2659 \|
	\| 0.2727 \| 0.2018 \| 700 \| 0.2690 \| 0.2690 \|
	\| 0.2701 \| 0.2306 \| 800 \| 0.2555 \| 0.2555 \|
	\| 0.2954 \| 0.2594 \| 900 \| 0.2501 \| 0.2501 \|
	\| 0.2618 \| 0.2882 \| 1000 \| 0.2483 \| 0.2483 \|
	\| 0.3081 \| 0.3171 \| 1100 \| 0.2456 \| 0.2456 \|
	\| 0.2544 \| 0.3459 \| 1200 \| 0.2370 \| 0.2370 \|
	\| 0.2593 \| 0.3747 \| 1300 \| 0.2349 \| 0.2349 \|
	\| 0.2361 \| 0.4035 \| 1400 \| 0.2406 \| 0.2406 \|
	\| 0.2536 \| 0.4324 \| 1500 \| 0.2453 \| 0.2453 \|
	\| 0.26 \| 0.4612 \| 1600 \| 0.2568 \| 0.2568 \|
	\| 0.2897 \| 0.4900 \| 1700 \| 0.2568 \| 0.2568 \|
	\| 0.2597 \| 0.5188 \| 1800 \| 0.2359 \| 0.2359 \|
	\| 0.2489 \| 0.5477 \| 1900 \| 0.2413 \| 0.2413 \|
	\| 0.2376 \| 0.5765 \| 2000 \| 0.2416 \| 0.2416 \|
	\| 0.2424 \| 0.6053 \| 2100 \| 0.2418 \| 0.2418 \|
	\| 0.2798 \| 0.6341 \| 2200 \| 0.2462 \| 0.2462 \|
	\| 0.2523 \| 0.6630 \| 2300 \| 0.2322 \| 0.2322 \|
	\| 0.286 \| 0.6918 \| 2400 \| 0.2432 \| 0.2432 \|
	\| 0.247 \| 0.7206 \| 2500 \| 0.2383 \| 0.2383 \|
	\| 0.2856 \| 0.7494 \| 2600 \| 0.2375 \| 0.2375 \|
	\| 0.2216 \| 0.7783 \| 2700 \| 0.2383 \| 0.2383 \|
	\| 0.255 \| 0.8071 \| 2800 \| 0.2367 \| 0.2367 \|
	\| 0.2406 \| 0.8359 \| 2900 \| 0.2345 \| 0.2345 \|
	\| 0.2388 \| 0.8647 \| 3000 \| 0.2282 \| 0.2282 \|
	\| 0.2571 \| 0.8936 \| 3100 \| 0.2331 \| 0.2331 \|
	\| 0.2672 \| 0.9224 \| 3200 \| 0.2336 \| 0.2336 \|
	\| 0.2375 \| 0.9512 \| 3300 \| 0.2337 \| 0.2337 \|
	\| 0.2423 \| 0.9800 \| 3400 \| 0.2324 \| 0.2324 \|


	### Framework versions

	- Transformers 4.42.3
	- Pytorch 2.3.1+cu121
	- Datasets 2.20.0
	- Tokenizers 0.19.1