yhavinga
/

Boreas-7B-chat

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

yhavinga commited on Apr 30

Commit

076a17b

•

1 Parent(s): 0723934

Update README.md

Files changed (1) hide show

README.md +25 -7

README.md CHANGED Viewed

@@ -106,12 +106,12 @@ bijvoorbeeld nooit een fragment krijgen dat begint met een paar tokens van het e
 ## Pre-training
-Boreas was pre-trained with the [EasyDeL JAX framework](https://github.com/erfanzar/EasyDel) on a tpu-v4-32
 kindly supplied by the Google [TPU Research Cloud](https://sites.research.google/trc/about/).
-Batch size 96
-Using flash attention, block size of 512
-Max sequence length of 2048
-LION optimizer, triangle learning rate schedule with max lr 3e-6, gradient clipping to 1.0
 ![img_3.png](images/img_3.png)
@@ -119,7 +119,7 @@ LION optimizer, triangle learning rate schedule with max lr 3e-6, gradient clipp
 ![img_5.png](images/img_5.png)
-Meer info [https://wandb.ai/yepster/EasyDeL-MistralBoreas/runs/ozw55qaq/workspace?nw=nwuseryepster](WandB Boreas 7B pre-train)
 ## Boreas-7B-chat
@@ -161,4 +161,22 @@ het Nederlands geschreven zijn door een persoon. Dit zijn de Nederlandse wiki q
 chat datasets. Hierdoor wordt er zoveel mogelijk voor gezorgd dat bij bijvoorbeeld educatie-achtige q en a, de in onze
 regio gebruikelijke termen en eenheden voorkomen in de chat database, tenminste voor de Nederlandstalige chats.
-Bij alle chat datasets is er alleen getraind op de assistant-completion tokens.

 ## Pre-training
+* Boreas was pre-trained with the [EasyDeL JAX framework](https://github.com/erfanzar/EasyDel) on a tpu-v4-32
 kindly supplied by the Google [TPU Research Cloud](https://sites.research.google/trc/about/).
+* Batch size 96, gradient accumulation steps 2
+* Using flash attention, block size of 512
+* Max sequence length of 2048
+* LION optimizer, triangle learning rate schedule with max lr 3e-6, gradient clipping to 1.0
 ![img_3.png](images/img_3.png)
 ![img_5.png](images/img_5.png)
+<!-- [https://wandb.ai/yepster/EasyDeL-MistralBoreas/runs/ozw55qaq/workspace?nw=nwuseryepster](WandB Boreas 7B pre-train) -->
 ## Boreas-7B-chat
 chat datasets. Hierdoor wordt er zoveel mogelijk voor gezorgd dat bij bijvoorbeeld educatie-achtige q en a, de in onze
 regio gebruikelijke termen en eenheden voorkomen in de chat database, tenminste voor de Nederlandstalige chats.
+Bij alle chat datasets is er alleen getraind op de assistant-completion tokens.
+## Fine-tuning
+* Boreas was fine-tuned with the [EasyDeL JAX framework](https://github.com/erfanzar/EasyDel) on a tpu-v4-32
+kindly supplied by the Google [TPU Research Cloud](https://sites.research.google/trc/about/).
+* Batch size 96, gradient accumulation 2,
+* Using flash attention, block size of 512
+* Max sequence length of 2048
+* LION optimizer, triangle learning rate schedule with max lr 2e-6, gradient clipping to 1.0 (NB: the schedule was not finished due to an error at the end of the dataset epoch. Since the loss had plateaued I decided then to not resume for another epoch)
+![loss finetune](images/loss_finetune.png)
+![accuracy finetune](images/accuracy_finetune.png)
+![learning rate finetune](images/lr_finetune.png)
+<!-- [https://wandb.ai/yepster/EasyDeL-MistralBoreas/runs/ynkl2jtx?nw=nwuseryepster](WandB Boreas 7B chat finetune) -->