Update README.md
Browse files
README.md
CHANGED
@@ -1,10 +1,37 @@
|
|
1 |
-
|
2 |
-
library_name: peft
|
3 |
-
---
|
4 |
-
## Training procedure
|
5 |
|
6 |
-
|
|
|
7 |
|
8 |
-
|
|
|
|
|
|
|
|
|
9 |
|
10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
### README for Gemma-2-2B-IT Fine-Tuning with LoRA
|
|
|
|
|
|
|
2 |
|
3 |
+
This project fine-tunes the `Gemma-2-2B-IT` model using **LoRA (Low-Rank Adaptation)** for Question Answering tasks, leveraging the `Wikitext-2` dataset.
|
4 |
+
The fine-tuning process is optimized for efficient training on limited GPU memory by freezing most model parameters and applying LoRA to specific layers.
|
5 |
|
6 |
+
### Project Overview
|
7 |
+
- **Model**: `Gemma-2-2B-IT`, a causal language model.
|
8 |
+
- **Dataset**: `Wikitext-2` for text generation and causal language modeling.
|
9 |
+
- **Training Strategy**: LoRA adaptation for low-resource fine-tuning.
|
10 |
+
- **Frameworks**: Hugging Face `transformers`, `peft`, and `datasets`.
|
11 |
|
12 |
+
### Key Features
|
13 |
+
- **LoRA Configuration**:
|
14 |
+
- LoRA is applied to the following projection layers: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, and `down_proj`.
|
15 |
+
- LoRA hyperparameters:
|
16 |
+
- Rank (`r`): 4
|
17 |
+
- LoRA Alpha: 8
|
18 |
+
- Dropout: 0.1
|
19 |
+
- **Training Configuration**:
|
20 |
+
- Mixed precision (`fp16`) enabled for faster and more memory-efficient training.
|
21 |
+
- Gradient accumulation with `32` steps to manage large model sizes on small GPUs.
|
22 |
+
- Batch size of 1 due to GPU memory constraints.
|
23 |
+
- Learning rate: `5e-5` with weight decay: `0.01`.
|
24 |
+
|
25 |
+
### System Requirements
|
26 |
+
- **GPU**: Required for efficient training. This script was tested with CUDA-enabled GPUs.
|
27 |
+
- **Python Packages**: Install dependencies with:
|
28 |
+
```bash
|
29 |
+
pip install -r requirements.txt
|
30 |
+
```
|
31 |
+
|
32 |
+
### Notes
|
33 |
+
- This fine-tuned model leverages LoRA to adapt the large `Gemma-2-2B-IT` model with minimal trainable parameters, allowing fine-tuning even on hardware with limited memory.
|
34 |
+
- The fine-tuned model can be further utilized for tasks like Question Answering, and it is optimized for resource-efficient deployment.
|
35 |
+
|
36 |
+
### Memory Usage
|
37 |
+
- The training script includes CUDA memory summaries before and after the training process to monitor GPU memory consumption.
|