isaacchung
/

llama3-8B-hotpotqa-raft

@@ -1,13 +1,12 @@
 ---
 library_name: transformers
-tags: []
 ---
 # Model Card for Model ID
 <!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
@@ -17,13 +16,13 @@ tags: []
 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
 - **Funded by [optional]:** [More Information Needed]
 - **Shared by [optional]:** [More Information Needed]
 - **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
 ### Model Sources [optional]
@@ -70,6 +69,13 @@ Users (both direct and downstream) should be made aware of the risks, biases and
 ## How to Get Started with the Model
 Use the code below to get started with the model.
 [More Information Needed]
@@ -78,6 +84,7 @@ Use the code below to get started with the model.
 ### Training Data
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 [More Information Needed]
@@ -94,6 +101,65 @@ Use the code below to get started with the model.
 - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 #### Speeds, Sizes, Times [optional]
 <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->

 ---
 library_name: transformers
+license: apache-2.0
 ---
 # Model Card for Model ID
 <!-- Provide a quick summary of what the model is/does. -->
+Finetuned Llama3-8B-Instruct model on https://huggingface.co/datasets/isaacchung/hotpotqa-dev-raft-subset.
 ## Model Details
 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** [Isaac Chung](https://huggingface.co/isaacchung)
 - **Funded by [optional]:** [More Information Needed]
 - **Shared by [optional]:** [More Information Needed]
 - **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [English]
+- **License:** [Apache 2.0]
+- **Finetuned from model [optional]:** [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
 ### Model Sources [optional]
 ## How to Get Started with the Model
 Use the code below to get started with the model.
+```python
+# Load model directly
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("isaacchung/llama3-8B-hotpotqa")
+model = AutoModelForCausalLM.from_pretrained("isaacchung/llama3-8B-hotpotqa")
+```
 [More Information Needed]
 ### Training Data
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+https://huggingface.co/datasets/isaacchung/hotpotqa-dev-raft-subset
 [More Information Needed]
 - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+Model loaded:
+```python
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    device_map="auto",
+    attn_implementation="flash_attention_2",
+    torch_dtype=torch.bfloat16,
+    quantization_config=bnb_config
+)
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+tokenizer.padding_side = 'right' # to prevent warnings
+```
+Training params:
+```python
+# LoRA config based on QLoRA paper & Sebastian Raschka experiment
+peft_config = LoraConfig(
+        lora_alpha=128,
+        lora_dropout=0.05,
+        r=256,
+        bias="none",
+        target_modules="all-linear",
+        task_type="CAUSAL_LM",
+)
+args = TrainingArguments(
+    num_train_epochs=3,                     # number of training epochs
+    per_device_train_batch_size=3,          # batch size per device during training
+    gradient_accumulation_steps=2,          # number of steps before performing a backward/update pass
+    gradient_checkpointing=True,            # use gradient checkpointing to save memory
+    optim="adamw_torch_fused",              # use fused adamw optimizer
+    logging_steps=10,                       # log every 10 steps
+    save_strategy="epoch",                  # save checkpoint every epoch
+    learning_rate=2e-4,                     # learning rate, based on QLoRA paper
+    bf16=True,                              # use bfloat16 precision
+    tf32=True,                              # use tf32 precision
+    max_grad_norm=0.3,                      # max gradient norm based on QLoRA paper
+    warmup_ratio=0.03,                      # warmup ratio based on QLoRA paper
+    lr_scheduler_type="constant",           # use constant learning rate scheduler
+)
+max_seq_length = 3072 # max sequence length for model and packing of the dataset
+trainer = SFTTrainer(
+    model=model,
+    args=args,
+    train_dataset=dataset,
+    peft_config=peft_config,
+    max_seq_length=max_seq_length,
+    tokenizer=tokenizer,
+    packing=True,
+    dataset_kwargs={
+        "add_special_tokens": False,  # We template with special tokens
+        "append_concat_token": False, # No need to add additional separator token
+    }
+)
+```
 #### Speeds, Sizes, Times [optional]
 <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->