junipark commited on
Commit
12bd83a
·
verified ·
1 Parent(s): 9b2921f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -4
README.md CHANGED
@@ -1,10 +1,41 @@
1
  ---
2
  library_name: peft
3
  ---
4
- ## Training procedure
5
 
6
- ### Framework versions
7
 
8
- - PEFT 0.4.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
 
10
- - PEFT 0.4.0
 
1
  ---
2
  library_name: peft
3
  ---
 
4
 
5
+ ### README for Gemma-2-2B-IT Fine-Tuning with LoRA
6
 
7
+ This project fine-tunes the `Gemma-2-2B-IT` model using **LoRA (Low-Rank Adaptation)** for Question Answering tasks, leveraging the `Wikitext-2` dataset. The fine-tuning process is optimized for efficient training on limited GPU memory by freezing most model parameters and applying LoRA to specific layers.
8
+
9
+ ### Project Overview
10
+ - **Model**: `Gemma-2-2B-IT`, a causal language model.
11
+ - **Dataset**: `Wikitext-2` for text generation and causal language modeling.
12
+ - **Training Strategy**: LoRA adaptation for low-resource fine-tuning.
13
+ - **Frameworks**: Hugging Face `transformers`, `peft`, and `datasets`.
14
+
15
+ ### Key Features
16
+ - **LoRA Configuration**:
17
+ - LoRA is applied to the following projection layers: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, and `down_proj`.
18
+ - LoRA hyperparameters:
19
+ - Rank (`r`): 4
20
+ - LoRA Alpha: 8
21
+ - Dropout: 0.1
22
+ - **Training Configuration**:
23
+ - Mixed precision (`fp16`) enabled for faster and more memory-efficient training.
24
+ - Gradient accumulation with `32` steps to manage large model sizes on small GPUs.
25
+ - Batch size of 1 due to GPU memory constraints.
26
+ - Learning rate: `5e-5` with weight decay: `0.01`.
27
+
28
+ ### System Requirements
29
+ - **GPU**: Required for efficient training. This script was tested with CUDA-enabled GPUs.
30
+ - **Python Packages**: Install dependencies with:
31
+ ```bash
32
+ pip install -r requirements.txt
33
+ ```
34
+
35
+ ### Notes
36
+ - This fine-tuned model leverages LoRA to adapt the large `Gemma-2-2B-IT` model with minimal trainable parameters, allowing fine-tuning even on hardware with limited memory.
37
+ - The fine-tuned model can be further utilized for tasks like Question Answering, and it is optimized for resource-efficient deployment.
38
+
39
+ ### Memory Usage
40
+ - The training script includes CUDA memory summaries before and after the training process to monitor GPU memory consumption.
41