pszemraj
/

mega-ar-350m-v0.13

Text Generation

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

mega-ar-350m-v0.13 / README.md

pszemraj's picture

End of training

26fd756 verified about 2 months ago

|

raw history blame

No virus

3.57 kB

	---
	license: apache-2.0
	base_model: pszemraj/mega-ar-350m-v0.12-napierone_epub
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	model-index:
	- name: mega-ar-350m-v0.12-napierone_epub-UltraTextbooks-2.1-fw_mix-vN
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# mega-ar-350m-v0.12-napierone_epub-UltraTextbooks-2.1-fw_mix-vN

	This model is a fine-tuned version of [pszemraj/mega-ar-350m-v0.12-napierone_epub](https://huggingface.co/pszemraj/mega-ar-350m-v0.12-napierone_epub) on the BEE-spoke-data/UltraTextbooks-2.1-fw_mix dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.9926
	- Accuracy: 0.5885
	- Num Input Tokens Seen: 3468165120

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 80085
	- distributed_type: multi-GPU
	- num_devices: 3
	- gradient_accumulation_steps: 32
	- total_train_batch_size: 96
	- total_eval_batch_size: 3
	- optimizer: Adam with betas=(0.9,0.985) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.05
	- num_epochs: 1.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \| Input Tokens Seen \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------:\|:-----------------:\|
	\| 2.2374 \| 0.0454 \| 400 \| 2.1871 \| 0.5588 \| 157286400 \|
	\| 2.143 \| 0.0907 \| 800 \| 2.1336 \| 0.5665 \| 314572800 \|
	\| 2.1272 \| 0.1361 \| 1200 \| 2.1092 \| 0.5698 \| 471859200 \|
	\| 2.1243 \| 0.1814 \| 1600 \| 2.0929 \| 0.5725 \| 629145600 \|
	\| 2.1021 \| 0.2268 \| 2000 \| 2.0794 \| 0.5747 \| 786432000 \|
	\| 2.0794 \| 0.2721 \| 2400 \| 2.0687 \| 0.5762 \| 943718400 \|
	\| 2.0843 \| 0.3175 \| 2800 \| 2.0592 \| 0.5776 \| 1101004800 \|
	\| 2.0571 \| 0.3628 \| 3200 \| 2.0507 \| 0.5793 \| 1258291200 \|
	\| 2.0841 \| 0.4082 \| 3600 \| 2.0435 \| 0.5802 \| 1415577600 \|
	\| 2.0484 \| 0.4535 \| 4000 \| 2.0363 \| 0.5813 \| 1572864000 \|
	\| 2.0199 \| 0.4989 \| 4400 \| 2.0315 \| 0.5820 \| 1730150400 \|
	\| 2.0361 \| 0.5442 \| 4800 \| 2.0261 \| 0.5829 \| 1887436800 \|
	\| 2.057 \| 0.5896 \| 5200 \| 2.0207 \| 0.5838 \| 2044723200 \|
	\| 2.0234 \| 0.6349 \| 5600 \| 2.0163 \| 0.5845 \| 2202009600 \|
	\| 2.073 \| 0.6803 \| 6000 \| 2.0120 \| 0.5850 \| 2359296000 \|
	\| 2.058 \| 0.7256 \| 6400 \| 2.0074 \| 0.5862 \| 2516582400 \|
	\| 2.0253 \| 0.7710 \| 6800 \| 2.0041 \| 0.5866 \| 2673868800 \|
	\| 1.995 \| 0.8163 \| 7200 \| 2.0010 \| 0.5872 \| 2831155200 \|
	\| 1.9735 \| 0.8617 \| 7600 \| 1.9987 \| 0.5875 \| 2988441600 \|
	\| 1.9799 \| 0.9070 \| 8000 \| 1.9960 \| 0.5880 \| 3145728000 \|
	\| 2.0056 \| 0.9524 \| 8400 \| 1.9942 \| 0.5882 \| 3303014400 \|
	\| 1.9961 \| 0.9977 \| 8800 \| 1.9926 \| 0.5884 \| 3460300800 \|


	### Framework versions

	- Transformers 4.40.2
	- Pytorch 2.2.0+cu121
	- Datasets 2.19.1
	- Tokenizers 0.19.1