colesmcintosh
commited on
Commit
β’
acc3558
1
Parent(s):
2877eea
Update README.md
Browse files
README.md
CHANGED
@@ -26,7 +26,7 @@ This llama model was trained 2x faster with [Unsloth](https://github.com/unsloth
|
|
26 |
|
27 |
---
|
28 |
|
29 |
-
## Model Details
|
30 |
- **Base Model**: unsloth/llama-3.2-1b-instruct-bnb-4bit
|
31 |
- **Training Dataset**: [SkunkworksAI/reasoning-0.01](https://huggingface.co/datasets/SkunkworksAI/reasoning-0.01) β Chain-of-thought reasoning dataset with 29.9k examples to improve the model's ability to solve reasoning problems step-by-step.
|
32 |
- **Techniques**:
|
@@ -36,11 +36,10 @@ This llama model was trained 2x faster with [Unsloth](https://github.com/unsloth
|
|
36 |
|
37 |
---
|
38 |
|
39 |
-
## Training Details
|
40 |
-
|
41 |
The fine-tuning process was conducted using the `SFTTrainer` class from the `trl` library, which is optimized for training transformer models using reinforcement learning techniques. The training process was structured as follows:
|
42 |
|
43 |
-
### Training Configuration
|
44 |
```python
|
45 |
from trl import SFTTrainer
|
46 |
from transformers import TrainingArguments, DataCollatorForSeq2Seq
|
@@ -73,7 +72,7 @@ trainer = SFTTrainer(
|
|
73 |
)
|
74 |
```
|
75 |
|
76 |
-
### Key Training Parameters
|
77 |
- **Batch Size**: `2` per device
|
78 |
- **Gradient Accumulation Steps**: `4` to accumulate gradients over multiple forward passes, allowing for effective training with smaller batch sizes.
|
79 |
- **Learning Rate**: `2e-4` with a linear decay schedule.
|
@@ -83,15 +82,15 @@ trainer = SFTTrainer(
|
|
83 |
- **Optimizer**: `adamw_8bit` β Adam optimizer with 8-bit memory-efficient operations, which reduces GPU memory usage during training.
|
84 |
- **Weight Decay**: `0.01` for regularization, preventing the model from overfitting.
|
85 |
|
86 |
-
### Dataset
|
87 |
- **Dataset Used for Training**: [SkunkworksAI/reasoning-0.01](https://huggingface.co/datasets/SkunkworksAI/reasoning-0.01) β The dataset contains **29.9k examples** of chain-of-thought reasoning instruction/output pairs.
|
88 |
|
89 |
-
### Collation Strategy
|
90 |
- **Data Collator**: `DataCollatorForSeq2Seq` is used to handle padding and tokenization efficiently, ensuring sequences are of the correct length during training.
|
91 |
|
92 |
---
|
93 |
|
94 |
-
## Inference Example
|
95 |
|
96 |
To run inference using the fine-tuned model, follow this code snippet:
|
97 |
|
|
|
26 |
|
27 |
---
|
28 |
|
29 |
+
## Model Details
|
30 |
- **Base Model**: unsloth/llama-3.2-1b-instruct-bnb-4bit
|
31 |
- **Training Dataset**: [SkunkworksAI/reasoning-0.01](https://huggingface.co/datasets/SkunkworksAI/reasoning-0.01) β Chain-of-thought reasoning dataset with 29.9k examples to improve the model's ability to solve reasoning problems step-by-step.
|
32 |
- **Techniques**:
|
|
|
36 |
|
37 |
---
|
38 |
|
39 |
+
## Training Details
|
|
|
40 |
The fine-tuning process was conducted using the `SFTTrainer` class from the `trl` library, which is optimized for training transformer models using reinforcement learning techniques. The training process was structured as follows:
|
41 |
|
42 |
+
### Training Configuration
|
43 |
```python
|
44 |
from trl import SFTTrainer
|
45 |
from transformers import TrainingArguments, DataCollatorForSeq2Seq
|
|
|
72 |
)
|
73 |
```
|
74 |
|
75 |
+
### Key Training Parameters
|
76 |
- **Batch Size**: `2` per device
|
77 |
- **Gradient Accumulation Steps**: `4` to accumulate gradients over multiple forward passes, allowing for effective training with smaller batch sizes.
|
78 |
- **Learning Rate**: `2e-4` with a linear decay schedule.
|
|
|
82 |
- **Optimizer**: `adamw_8bit` β Adam optimizer with 8-bit memory-efficient operations, which reduces GPU memory usage during training.
|
83 |
- **Weight Decay**: `0.01` for regularization, preventing the model from overfitting.
|
84 |
|
85 |
+
### Dataset
|
86 |
- **Dataset Used for Training**: [SkunkworksAI/reasoning-0.01](https://huggingface.co/datasets/SkunkworksAI/reasoning-0.01) β The dataset contains **29.9k examples** of chain-of-thought reasoning instruction/output pairs.
|
87 |
|
88 |
+
### Collation Strategy
|
89 |
- **Data Collator**: `DataCollatorForSeq2Seq` is used to handle padding and tokenization efficiently, ensuring sequences are of the correct length during training.
|
90 |
|
91 |
---
|
92 |
|
93 |
+
## Inference Example
|
94 |
|
95 |
To run inference using the fine-tuned model, follow this code snippet:
|
96 |
|