stojchet
/

lr_sft1

+---
+base_model: deepseek-ai/deepseek-coder-1.3b-base
+datasets:
+- generator
+library_name: peft
+license: other
+tags:
+- trl
+- sft
+- generated_from_trainer
+model-index:
+- name: lr_sft1
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/stojchets/huggingface/runs/lr_sft1)
+# lr_sft1
+This model is a fine-tuned version of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base) on the generator dataset.
+It achieves the following results on the evaluation set:
+- Loss: 1.1672
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.00141
+- train_batch_size: 8
+- eval_batch_size: 8
+- seed: 42
+- gradient_accumulation_steps: 16
+- total_train_batch_size: 128
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- num_epochs: 1
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss |
+|:-------------:|:------:|:----:|:---------------:|
+| 1.2662        | 0.0128 | 1    | 1.2175          |
+| 1.2021        | 0.0256 | 2    | 1.1878          |
+| 1.2099        | 0.0384 | 3    | 1.1839          |
+| 1.1084        | 0.0512 | 4    | 1.1874          |
+| 1.1652        | 0.064  | 5    | 1.1817          |
+| 1.1503        | 0.0768 | 6    | 1.1817          |
+| 1.1545        | 0.0896 | 7    | 1.1776          |
+| 1.2043        | 0.1024 | 8    | 1.1785          |
+| 1.1557        | 0.1152 | 9    | 1.1759          |
+| 1.1748        | 0.128  | 10   | 1.1749          |
+| 1.2061        | 0.1408 | 11   | 1.1757          |
+| 1.1357        | 0.1536 | 12   | 1.1757          |
+| 1.1039        | 0.1664 | 13   | 1.1753          |
+| 1.2229        | 0.1792 | 14   | 1.1755          |
+| 1.148         | 0.192  | 15   | 1.1750          |
+| 1.1819        | 0.2048 | 16   | 1.1746          |
+| 1.1758        | 0.2176 | 17   | 1.1745          |
+| 1.1895        | 0.2304 | 18   | 1.1742          |
+| 1.1277        | 0.2432 | 19   | 1.1741          |
+| 1.1258        | 0.256  | 20   | 1.1739          |
+| 1.1493        | 0.2688 | 21   | 1.1733          |
+| 1.1295        | 0.2816 | 22   | 1.1733          |
+| 1.1768        | 0.2944 | 23   | 1.1736          |
+| 1.206         | 0.3072 | 24   | 1.1735          |
+| 1.1397        | 0.32   | 25   | 1.1732          |
+| 1.1736        | 0.3328 | 26   | 1.1734          |
+| 1.1412        | 0.3456 | 27   | 1.1740          |
+| 1.1383        | 0.3584 | 28   | 1.1745          |
+| 1.1216        | 0.3712 | 29   | 1.1742          |
+| 1.1127        | 0.384  | 30   | 1.1731          |
+| 1.1234        | 0.3968 | 31   | 1.1724          |
+| 1.1406        | 0.4096 | 32   | 1.1724          |
+| 1.186         | 0.4224 | 33   | 1.1723          |
+| 1.154         | 0.4352 | 34   | 1.1721          |
+| 1.114         | 0.448  | 35   | 1.1724          |
+| 1.1148        | 0.4608 | 36   | 1.1728          |
+| 1.1422        | 0.4736 | 37   | 1.1726          |
+| 1.1561        | 0.4864 | 38   | 1.1721          |
+| 1.1964        | 0.4992 | 39   | 1.1716          |
+| 1.1288        | 0.512  | 40   | 1.1714          |
+| 1.142         | 0.5248 | 41   | 1.1713          |
+| 1.149         | 0.5376 | 42   | 1.1711          |
+| 1.1104        | 0.5504 | 43   | 1.1710          |
+| 1.12          | 0.5632 | 44   | 1.1709          |
+| 1.1256        | 0.576  | 45   | 1.1710          |
+| 1.162         | 0.5888 | 46   | 1.1710          |
+| 1.0982        | 0.6016 | 47   | 1.1710          |
+| 1.1383        | 0.6144 | 48   | 1.1710          |
+| 1.1394        | 0.6272 | 49   | 1.1708          |
+| 1.1196        | 0.64   | 50   | 1.1707          |
+| 1.156         | 0.6528 | 51   | 1.1705          |
+| 1.105         | 0.6656 | 52   | 1.1703          |
+| 1.1455        | 0.6784 | 53   | 1.1701          |
+| 1.1266        | 0.6912 | 54   | 1.1698          |
+| 1.1063        | 0.704  | 55   | 1.1695          |
+| 1.127         | 0.7168 | 56   | 1.1693          |
+| 1.1501        | 0.7296 | 57   | 1.1690          |
+| 1.1383        | 0.7424 | 58   | 1.1688          |
+| 1.1174        | 0.7552 | 59   | 1.1686          |
+| 1.1413        | 0.768  | 60   | 1.1685          |
+| 1.1871        | 0.7808 | 61   | 1.1684          |
+| 1.1796        | 0.7936 | 62   | 1.1683          |
+| 1.123         | 0.8064 | 63   | 1.1683          |
+| 1.1645        | 0.8192 | 64   | 1.1682          |
+| 1.1165        | 0.832  | 65   | 1.1681          |
+| 1.0805        | 0.8448 | 66   | 1.1680          |
+| 1.2018        | 0.8576 | 67   | 1.1678          |
+| 1.0869        | 0.8704 | 68   | 1.1677          |
+| 1.1286        | 0.8832 | 69   | 1.1676          |
+| 1.0889        | 0.896  | 70   | 1.1676          |
+| 1.1395        | 0.9088 | 71   | 1.1675          |
+| 1.1756        | 0.9216 | 72   | 1.1674          |
+| 1.1575        | 0.9344 | 73   | 1.1674          |
+| 1.1073        | 0.9472 | 74   | 1.1673          |
+| 1.163         | 0.96   | 75   | 1.1673          |
+| 1.1789        | 0.9728 | 76   | 1.1673          |
+| 1.1267        | 0.9856 | 77   | 1.1673          |
+| 1.1416        | 0.9984 | 78   | 1.1672          |
+### Framework versions
+- PEFT 0.10.0
+- Transformers 4.43.0.dev0
+- Pytorch 2.2.2+cu121
+- Datasets 2.19.2
+- Tokenizers 0.19.1