--- tags: - generated_from_trainer model-index: - name: vicuna_13b_stage1 results: [] --- [Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) # vicuna_13b_stage1 This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set: - Loss: 1.2017 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0005 - train_batch_size: 2 - eval_batch_size: 2 - seed: 42 - distributed_type: multi-GPU - num_devices: 2 - gradient_accumulation_steps: 4 - total_train_batch_size: 16 - total_eval_batch_size: 4 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 40 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:----:|:---------------:| | 1.9535 | 0.02 | 40 | 1.9456 | | 1.8556 | 0.04 | 80 | 1.7714 | | 1.791 | 0.06 | 120 | 1.7425 | | 1.6622 | 0.08 | 160 | 1.7164 | | 1.8169 | 0.1 | 200 | 1.7154 | | 1.7356 | 0.12 | 240 | 1.7026 | | 1.6051 | 0.14 | 280 | 1.7104 | | 1.7925 | 0.16 | 320 | 1.7127 | | 1.8257 | 0.18 | 360 | 1.7055 | | 1.7057 | 0.2 | 400 | 1.6906 | | 1.9282 | 0.22 | 440 | 1.6746 | | 1.668 | 0.24 | 480 | 1.7052 | | 1.6273 | 0.26 | 520 | 1.6620 | | 1.6136 | 0.28 | 560 | 1.6616 | | 1.4754 | 0.3 | 600 | 1.6389 | | 1.4024 | 0.32 | 640 | 1.6038 | | 1.6773 | 0.34 | 680 | 1.5743 | | 1.6008 | 0.36 | 720 | 1.5607 | | 1.568 | 0.39 | 760 | 1.5236 | | 1.4922 | 0.41 | 800 | 1.5158 | | 1.4667 | 0.43 | 840 | 1.4938 | | 1.5653 | 0.45 | 880 | 1.4692 | | 1.331 | 0.47 | 920 | 1.4581 | | 1.4019 | 0.49 | 960 | 1.4290 | | 1.4925 | 0.51 | 1000 | 1.4087 | | 1.4772 | 0.53 | 1040 | 1.3961 | | 1.4728 | 0.55 | 1080 | 1.3817 | | 1.4555 | 0.57 | 1120 | 1.3559 | | 1.5487 | 0.59 | 1160 | 1.3399 | | 1.3888 | 0.61 | 1200 | 1.3212 | | 1.2544 | 0.63 | 1240 | 1.3099 | | 1.2657 | 0.65 | 1280 | 1.2972 | | 1.3641 | 0.67 | 1320 | 1.2815 | | 1.2915 | 0.69 | 1360 | 1.2687 | | 1.4182 | 0.71 | 1400 | 1.2541 | | 1.2515 | 0.73 | 1440 | 1.2427 | | 1.2287 | 0.75 | 1480 | 1.2352 | | 1.1886 | 0.77 | 1520 | 1.2285 | | 1.2651 | 0.79 | 1560 | 1.2219 | | 1.3341 | 0.81 | 1600 | 1.2145 | | 1.2357 | 0.83 | 1640 | 1.2107 | | 1.0767 | 0.85 | 1680 | 1.2080 | | 1.2158 | 0.87 | 1720 | 1.2051 | | 1.2042 | 0.89 | 1760 | 1.2034 | | 1.1887 | 0.91 | 1800 | 1.2023 | | 1.2662 | 0.93 | 1840 | 1.2018 | | 1.1866 | 0.95 | 1880 | 1.2017 | | 1.1798 | 0.97 | 1920 | 1.2017 | | 1.336 | 0.99 | 1960 | 1.2017 | ### Framework versions - Transformers 4.34.1 - Pytorch 2.3.1+cu121 - Datasets 2.14.7 - Tokenizers 0.14.1