jordiclive
/

falcon-40b-lora-sft-stage2-1.1k

@@ -34,6 +34,18 @@ model = AutoModelForCausalLM.from_pretrained(
 ```
 # LoRA Adapter for Falcon 40B trained on oasst-top1
@@ -42,7 +54,8 @@ This repo contains a **Falcon 40B** LoRA fine-tuned model and the low-rank adapt
 This version of the weights was trained with the following hyperparameters:
-- Epochs: 8
 - Batch size: 128
 - Max Length: 2048
 - Learning rate: 1e-4
@@ -50,25 +63,79 @@ This version of the weights was trained with the following hyperparameters:
 - Lora Alpha: 16
 - Lora target modules: ["dense_4h_to_h", "dense", "query_key_value", "dense_h_to_4h"]
-These are recommended from the QLoRA paper. The model was trained with flash attention and gradient checkpointing and deepspeed stage 3 on 8 x A100 80gb
 Dataset:
 ```
-oasst-top1:
-  datasets:
-    - oasst_export:
-        lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk" # sft-8.0
-        input_file_path: 2023-05-06_OASST_labels.jsonl.gz
-        val_split: 0.05
-        top_k: 1
 ```
-## Model Details
-- **Developed** as part of the OpenAssistant Project
-- **Model type:** PEFT Adapter for frozen Falcon
-- **Language:** English
 ## Prompting

 ```
+## Model Details
+- **Developed** as part of the OpenAssistant Project
+- **Model type:** LoRA (PEFT)
+- **Language:** English, German, Spanish, French (and limited capabilities in Italian, Portuguese, Polish, Dutch, Romanian, Czech, Swedish);
+- **Finetuned from:** [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-4b)
+- **Model type:** Causal decoder-only transformer language model
+- **Weights & Biases:** [Training log1](https://wandb.ai/open-assistant/public-sft/runs/q0q9lce4)
+                        [Training log2](https://wandb.ai/open-assistant/public-sft/runs/qqok9ru2?workspace=user-jordanclive)
 # LoRA Adapter for Falcon 40B trained on oasst-top1
 This version of the weights was trained with the following hyperparameters:
+SFT 1
+- Epochs: 2
 - Batch size: 128
 - Max Length: 2048
 - Learning rate: 1e-4
 - Lora Alpha: 16
 - Lora target modules: ["dense_4h_to_h", "dense", "query_key_value", "dense_h_to_4h"]
+SFT2
+- Epochs: 10
+- Batch size: 128
+The model was trained with flash attention and gradient checkpointing and deepspeed stage 3 on 8 x A100 80gb
 Dataset:
+SFT1:
+```
+  - oa_leet10k:
+      val_split: 0.05
+      max_val_set: 250
+  - cmu_wiki_qa:
+      val_split: 0.05
+  - joke:
+      val_split: 0.05
+  - webgpt:
+      val_split: 0.05
+      max_val_set: 250
+  - alpaca_gpt4:
+      val_split: 0.025
+      max_val_set: 250
+  - gpteacher_roleplay:
+      val_split: 0.05
+  - wizardlm_70k:
+      val_split: 0.05
+      max_val_set: 500
+  - poem_instructions:
+      val_split: 0.025
+  - tell_a_joke:
+      val_split: 0.05
+      max_val_set: 250
+  - gpt4all:
+      val_split: 0.01
+      max_val_set: 1000
+  - minimath:
+      val_split: 0.05
+  - humaneval_mbpp_codegen_qa:
+      val_split: 0.05
+  - humaneval_mbpp_testgen_qa:
+      val_split: 0.05
+  - dolly15k:
+      val_split: 0.05
+      max_val_set: 300
+  - recipes:
+      val_split: 0.05
+  - code_alpaca:
+      val_split: 0.05
+      max_val_set: 250
+  - vicuna:
+      fraction: 0.5
+      val_split: 0.025
+      max_val_set: 250
+  - oa_wiki_qa_bart_10000row:
+      val_split: 0.05
+      max_val_set: 250
+  - grade_school_math_instructions:
+      val_split: 0.05
+```
+SFT2
 ```
+- oasst_export:
+    lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk" # sft-8.0
+    input_file_path: 2023-05-06_OASST_labels.jsonl.gz
+    val_split: 0.05
+    top_k: 1
+- lima:
+    val_split: 0.05
+    max_val_set: 50
 ```
 ## Prompting