robinsmits
/

Qwen1.5-7B-Dutch-Chat-Sft

Text Generation

Generated from Trainer

Model card Files Files and versions Metrics Training metrics Community

robinsmits commited on Mar 30, 2024

Commit

d5433c6

·

verified ·

1 Parent(s): 83ca45d

Update README.md

Files changed (1) hide show

README.md +11 -10

README.md CHANGED Viewed

@@ -19,26 +19,27 @@ pipeline_tag: text-generation
 inference: false
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 # Qwen1.5-7B-Dutch-Chat-Sft
-This model is a fine-tuned version of [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) on an unknown dataset.
-It achieves the following results on the evaluation set:
-- Loss: 1.1756
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure

 inference: false
 ---
 # Qwen1.5-7B-Dutch-Chat-Sft
 ## Model description
+This finetuned model is an adapter model based on [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat).
+Finetuning was performed on the Dutch [BramVanroy/ultrachat_200k_dutch](https://huggingface.co/datasets/BramVanroy/ultrachat_200k_dutch) dataset.
 ## Intended uses & limitations
+As with all LLM's this model can also experience bias and hallucinations. Regardless of how you use this model always perform the necessary testing and validating.
+The used dataset does not allow commercial usage.
 ## Training and evaluation data
+The training notebook is available at the following link: [Qwen1_5_7B_Dutch_Chat_SFT](https://github.com/RobinSmits/Dutch-LLMs/blob/main/Qwen1_5_7B_Dutch_Chat_SFT.ipynb)
+Training was performed with Google Colab PRO on a A100 - 40GB.
+As the amount of data was more than would fit within the maximum 24 hour session that Google Colab PRO allows I split the dataset in 2 equal parts. Training for each part lasted around 14 hours. In the second part I enabled 'resume_from_checkpoint' to continue the training.
 ## Training procedure