colesmcintosh commited on
Commit
4094035
1 Parent(s): 97db3e3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +122 -2
README.md CHANGED
@@ -13,7 +13,7 @@ datasets:
13
  - SkunkworksAI/reasoning-0.01
14
  ---
15
 
16
- # Uploaded model
17
 
18
  - **Developed by:** colesmcintosh
19
  - **License:** apache-2.0
@@ -21,4 +21,124 @@ datasets:
21
 
22
  This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
23
 
24
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  - SkunkworksAI/reasoning-0.01
14
  ---
15
 
16
+ # Uploaded model
17
 
18
  - **Developed by:** colesmcintosh
19
  - **License:** apache-2.0
 
21
 
22
  This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
23
 
24
+ [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
25
+
26
+ ---
27
+
28
+ ## Model Details:
29
+ - **Base Model**: unsloth/llama-3.2-1b-instruct-bnb-4bit
30
+ - **Training Dataset**: [SkunkworksAI/reasoning-0.01](https://huggingface.co/datasets/SkunkworksAI/reasoning-0.01) – Chain-of-thought reasoning dataset with 29.9k examples to improve the model's ability to solve reasoning problems step-by-step.
31
+ - **Techniques**:
32
+ - **LoRA (Low-Rank Adaptation)**: Fine-tuning to enhance conversational abilities without overfitting.
33
+ - **QLoRA (4-bit Quantization)**: Used for reducing model size and speeding up inference times without sacrificing too much accuracy.
34
+ - **RoPE Scaling**: To handle long-sequence token inputs effectively (up to 64k tokens).
35
+
36
+ ---
37
+
38
+ ## Training Details:
39
+
40
+ The fine-tuning process was conducted using the `SFTTrainer` class from the `trl` library, which is optimized for training transformer models using reinforcement learning techniques. The training process was structured as follows:
41
+
42
+ ### Training Configuration:
43
+ ```python
44
+ from trl import SFTTrainer
45
+ from transformers import TrainingArguments, DataCollatorForSeq2Seq
46
+ from unsloth import is_bfloat16_supported
47
+
48
+ trainer = SFTTrainer(
49
+ model = model,
50
+ tokenizer = tokenizer,
51
+ train_dataset = dataset,
52
+ dataset_text_field = "text",
53
+ max_seq_length = max_seq_length,
54
+ data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
55
+ dataset_num_proc = 2,
56
+ packing = False, # Can make training 5x faster for short sequences.
57
+ args = TrainingArguments(
58
+ per_device_train_batch_size = 2,
59
+ gradient_accumulation_steps = 4,
60
+ warmup_steps = 5,
61
+ max_steps = 60,
62
+ learning_rate = 2e-4,
63
+ fp16 = not is_bfloat16_supported(),
64
+ bf16 = is_bfloat16_supported(),
65
+ logging_steps = 1,
66
+ optim = "adamw_8bit",
67
+ weight_decay = 0.01,
68
+ lr_scheduler_type = "linear",
69
+ seed = 3407,
70
+ output_dir = "outputs",
71
+ ),
72
+ )
73
+ ```
74
+
75
+ ### Key Training Parameters:
76
+ - **Batch Size**: `2` per device
77
+ - **Gradient Accumulation Steps**: `4` to accumulate gradients over multiple forward passes, allowing for effective training with smaller batch sizes.
78
+ - **Learning Rate**: `2e-4` with a linear decay schedule.
79
+ - **Warmup Steps**: `5` steps used to gradually increase the learning rate at the start of training.
80
+ - **Max Training Steps**: `60` total training steps.
81
+ - **FP16/BF16 Precision**: The model uses FP16 unless BF16 is supported, in which case it switches to BF16 precision for faster training on GPUs that support it.
82
+ - **Optimizer**: `adamw_8bit` – Adam optimizer with 8-bit memory-efficient operations, which reduces GPU memory usage during training.
83
+ - **Weight Decay**: `0.01` for regularization, preventing the model from overfitting.
84
+
85
+ ### Dataset:
86
+ - **Dataset Used for Training**: [SkunkworksAI/reasoning-0.01](https://huggingface.co/datasets/SkunkworksAI/reasoning-0.01) – The dataset contains **29.9k examples** of chain-of-thought reasoning instruction/output pairs.
87
+
88
+ ### Collation Strategy:
89
+ - **Data Collator**: `DataCollatorForSeq2Seq` is used to handle padding and tokenization efficiently, ensuring sequences are of the correct length during training.
90
+
91
+ ---
92
+
93
+ ## Inference Example:
94
+
95
+ To run inference using the fine-tuned model, follow this code snippet:
96
+
97
+ ```python
98
+ from unsloth import FastLanguageModel
99
+
100
+ model, tokenizer = FastLanguageModel.from_pretrained(
101
+ model_name="colesmcintosh/Llama-3.1-8B-Instruct-Mango",
102
+ max_seq_length=64000,
103
+ dtype=None, # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
104
+ load_in_4bit=True
105
+ )
106
+
107
+ # Enable native 2x faster inference
108
+ FastLanguageModel.for_inference(model)
109
+
110
+ # Prepare the input message
111
+ messages = [
112
+ {"role": "user", "content": "Describe a tall tower in the capital of France."},
113
+ ]
114
+
115
+ inputs = tokenizer.apply_chat_template(
116
+ messages,
117
+ tokenize=True,
118
+ add_generation_prompt=True, # Must add for generation
119
+ return_tensors="pt",
120
+ ).to("cuda")
121
+
122
+ from transformers import TextStreamer
123
+ text_streamer = TextStreamer(tokenizer, skip_prompt=True)
124
+
125
+ # Generate response from the model
126
+ _ = model.generate(input_ids=inputs, streamer=text_streamer, max_new_tokens=128, use_cache=True, temperature=1.5, min_p=0.1)
127
+ ```
128
+
129
+ This snippet will generate a response based on the input message and run inference using the `FastLanguageModel` class with the optimizations included in the `Unsloth` framework.
130
+
131
+ You can also use Hugging Face's AutoModelForPeftCausalLM. Only use this if you do not have unsloth installed. It can be hopelessly slow, since 4bit model downloading is not supported, and Unsloth's inference is 2x faster.
132
+ ```python
133
+ from peft import AutoPeftModelForCausalLM
134
+ from transformers import AutoTokenizer
135
+ model = AutoPeftModelForCausalLM.from_pretrained(
136
+ "colesmcintosh/Llama-3.1-8B-Instruct-Mango", # YOUR MODEL YOU USED FOR TRAINING
137
+ load_in_4bit = load_in_4bit,
138
+ )
139
+ tokenizer = AutoTokenizer.from_pretrained("colesmcintosh/Llama-3.1-8B-Instruct-Mango")
140
+ ```
141
+ I highly do NOT suggest - use Unsloth if possible!
142
+
143
+ ## Additional Information
144
+ If you have any questions, feedback, or would like to collaborate on projects using this model, feel free to reach out to me on [LinkedIn](https://www.linkedin.com/in/cole-mcintosh) or visit my website at [colemcintosh.io](https://colemcintosh.io/). I’m always open to discussing AI, model development, and innovative solutions!