BramVanroy
/

Llama-2-13b-chat-dutch

@@ -1,38 +1,60 @@
 ---
-license: apache-2.0
 base_model: BramVanroy/llama2-13b-ft-mc4_nl_cleaned_tiny
 tags:
 - generated_from_trainer
 datasets:
 - BramVanroy/dutch_chat_datasets
 model-index:
-- name: 2e-4lr+64tbs+32a+4r
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# 2e-4lr+64tbs+32a+4r
-This model is a fine-tuned version of [BramVanroy/llama2-13b-ft-mc4_nl_cleaned_tiny](https://huggingface.co/BramVanroy/llama2-13b-ft-mc4_nl_cleaned_tiny) on the BramVanroy/dutch_chat_datasets dataset.
-It achieves the following results on the evaluation set:
-- Loss: 1.0848
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:

 ---
+license: cc-by-nc-sa-4.0
 base_model: BramVanroy/llama2-13b-ft-mc4_nl_cleaned_tiny
 tags:
 - generated_from_trainer
+- llama
+- lora
+- adapters
 datasets:
 - BramVanroy/dutch_chat_datasets
 model-index:
+- name: Llama-2-13b-chat-dutch
   results: []
+language:
+- nl
 ---
+# Llama-2-13b-chat-dutch
+This model is a fine-tuned version of [BramVanroy/llama2-13b-ft-mc4_nl_cleaned_tiny](https://huggingface.co/BramVanroy/llama2-13b-ft-mc4_nl_cleaned_tiny)
+on the [BramVanroy/dutch_chat_datasets](https://huggingface.co/datasets/BramVanroy/dutch_chat_datasets) dataset on a context of 4096 tokens.
+See the original [meta-llama/Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf) for more information, intended use, and biases.
 ## Model description
+I could not get Llama 2 13B to produce much Dutch, even though the description paper indicates that it was trained on a (small) portion of Dutch data. I therefore
+continue training the original Llama 2 13B checkpoint on Dutch data [in regular CLM](https://huggingface.co/BramVanroy/llama2-13b-ft-mc4_nl_cleaned_tiny). In a second
+step I finetuned that model on a collection of synthetic (translated) instruction and chat datasets that I have [collected](https://huggingface.co/datasets/BramVanroy/dutch_chat_datasets). See their pages for licensing, usage, creation, and citation information.
+- https://huggingface.co/datasets/BramVanroy/dolly-15k-dutch
+- https://huggingface.co/datasets/BramVanroy/alpaca-cleaned-dutch-baize
+- https://huggingface.co/datasets/BramVanroy/stackoverflow-chat-dutch
+- https://huggingface.co/datasets/BramVanroy/quora-chat-dutch
 ## Intended uses & limitations
+Depending on the prompt, the model can return good results considering that it is only 13B in size and was only marginally pretrained on Dutch. That being said, the
+model was not trained on human feedback and contains no safe-guards so it may produce unexpected and even offensive content depending on the query. The only attempt
+of a safe-guard is the default prompt that it was trained on, which was
+> Je bent een behulpzame, respectvolle en eerlijke assistent. Antwoord altijd zo behulpzaam mogelijk. Je antwoorden mogen geen schadelijke, onethische, racistische, seksistische, gevaarlijke of illegale inhoud bevatten. Zorg ervoor dat je antwoorden sociaal onbevooroordeeld en positief van aard zijn.\n\nAls een vraag nergens op slaat of feitelijk niet coherent is, leg dan uit waarom in plaats van iets niet correct te antwoorden. Als je het antwoord op een vraag niet weet, deel dan geen onjuiste informatie.\
+Use with caution and at your own risk!
+Because the model was trained on synthetic data, translated with OpenAI's API, you cannot use this model to create a competitive product to theirs.
 ## Training procedure
+Trained with 4096 tokens context length. The dataset was preprocessed so that as many as possible dialogs were put in a single batch, without disrupting
+dialogs. In other words, a dialog was never split up over different sequences or batches. During training, the human prompts were ignored in back propagation.
+Trained with LoRA targetting ["q_proj", "v_proj"] in 4 bit and merged before upload. Trained with Flash Attention as borrowed from [here](https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/utils/llama_patch.py).
+The adapters are in the `adapters` branch.
 ### Training hyperparameters
 The following hyperparameters were used during training: