aiqwe commited on
Commit
fe7da37
โ€ข
1 Parent(s): fc3a358

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +111 -45
README.md CHANGED
@@ -9,51 +9,117 @@ base_model: google/gemma-1.1-2b-it
9
  model-index:
10
  - name: gemma-2b-it-example-v1
11
  results: []
 
 
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
 
17
- # gemma-2b-it-example-v1
18
-
19
- This model is a fine-tuned version of [google/gemma-1.1-2b-it](https://huggingface.co/google/gemma-1.1-2b-it) on an unknown dataset.
20
-
21
- ## Model description
22
-
23
- More information needed
24
-
25
- ## Intended uses & limitations
26
-
27
- More information needed
28
-
29
- ## Training and evaluation data
30
-
31
- More information needed
32
-
33
- ## Training procedure
34
-
35
- ### Training hyperparameters
36
-
37
- The following hyperparameters were used during training:
38
- - learning_rate: 0.0001
39
- - train_batch_size: 4
40
- - eval_batch_size: 8
41
- - seed: 42
42
- - gradient_accumulation_steps: 2
43
- - total_train_batch_size: 8
44
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
45
- - lr_scheduler_type: cosine
46
- - lr_scheduler_warmup_ratio: 0.05
47
- - num_epochs: 5
48
-
49
- ### Training results
50
-
51
-
52
-
53
- ### Framework versions
54
-
55
- - PEFT 0.10.0
56
- - Transformers 4.40.2
57
- - Pytorch 2.2.1+cu121
58
- - Datasets 2.19.1
59
- - Tokenizers 0.19.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  model-index:
10
  - name: gemma-2b-it-example-v1
11
  results: []
12
+ language:
13
+ - ko
14
  ---
15
 
 
 
16
 
17
+ ## Model Description
18
+ **git hub** : [https://github.com/aiqwe/instruction-tuning-with-rag-example](https://github.com/aiqwe/instruction-tuning-with-rag-example)
19
+ Instruction Tuning์˜ ํ•™์Šต์„ ์œ„ํ•ด ์˜ˆ์‹œ๋กœ ํ•™์Šตํ•œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
20
+ [gemma-2b-it](https://huggingface.co/google/gemma-2b-it) ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์•ฝ 1๋งŒ๊ฐœ์˜ ๋ถ€๋™์‚ฐ ๊ด€๋ จ Instruction ๋ฐ์ดํ„ฐ์…‹์„ ํ•™์Šตํ•˜์˜€์Šต๋‹ˆ๋‹ค.
21
+ ํ•™์Šต ์ฝ”๋“œ๋Š” ์œ„ git hub๋ฅผ ์ฐธ์กฐํ•ด์ฃผ์„ธ์š”.
22
+
23
+ ## Usage
24
+ ### Inference on GPU example
25
+ ```python
26
+ from transformers import AutoTokenizer, AutoModelForCausalLM
27
+
28
+ tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
29
+ model = AutoModelForCausalLM.from_pretrained(
30
+ "aiqwe/gemma-2b-it-example-v1",
31
+ device_map="cuda",
32
+ torch_dtype=torch.bfloat16,
33
+ attn_implementation="flash_attention_2"
34
+ )
35
+
36
+ input_text = "์•„ํŒŒํŠธ ์žฌ๊ฑด์ถ•์— ๋Œ€ํ•ด ์•Œ๋ ค์ค˜."
37
+ input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
38
+
39
+ outputs = model.generate(**input_ids, max_new_tokens=512)
40
+ print(tokenizer.decode(outputs[0]))
41
+
42
+ ```
43
+
44
+
45
+ ### Inference on CPU example
46
+ ```python
47
+ from transformers import AutoTokenizer, AutoModelForCausalLM
48
+
49
+ tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
50
+ model = AutoModelForCausalLM.from_pretrained(
51
+ "aiqwe/gemma-2b-it-example-v1",
52
+ device_map="cpu",
53
+ torch_dtype=torch.bfloat16
54
+ )
55
+
56
+ input_text = "์•„ํŒŒํŠธ ์žฌ๊ฑด์ถ•์— ๋Œ€ํ•ด ์•Œ๋ ค์ค˜."
57
+ input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
58
+
59
+ outputs = model.generate(**input_ids, max_new_tokens=512)
60
+ print(tokenizer.decode(outputs[0]))
61
+ ```
62
+
63
+ ### Inference on GPU with embedded function example
64
+ ๋‚ด์žฅ๋œ ํ•จ์ˆ˜๋กœ ๋„ค์ด๋ฒ„ ๊ฒ€์ƒ‰ API๋ฅผ ํ†ตํ•ด RAG๋ฅผ ์ง€์›๋ฐ›์Šต๋‹ˆ๋‹ค.
65
+ ```python
66
+ from transformers import AutoTokenizer, AutoModelForCausalLM
67
+ from utils import generate
68
+
69
+ tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
70
+ model = AutoModelForCausalLM.from_pretrained(
71
+ "aiqwe/gemma-2b-it-example-v1",
72
+ device_map="cuda",
73
+ torch_dtype=torch.bfloat16,
74
+ attn_implementation="flash_attention_2"
75
+ )
76
+
77
+ rag_config = {
78
+ "api_client_id": userdata.get('NAVER_API_ID'),
79
+ "api_client_secret": userdata.get('NAVER_API_SECRET')
80
+ }
81
+ completion = generate(
82
+ model=model,
83
+ tokenizer=tokenizer,
84
+ query=query,
85
+ max_new_tokens=512,
86
+ rag=True,
87
+ rag_config=rag_config
88
+ )
89
+ print(completion)
90
+ ```
91
+
92
+ ## Chat Template
93
+ Gemma ๋ชจ๋ธ์˜ Chat Template์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
94
+ [gemma-2b-it Chat Template](https://huggingface.co/google/gemma-2b-it#chat-template)
95
+ ```python
96
+ input_text = "์•„ํŒŒํŠธ ์žฌ๊ฑด์ถ•์— ๋Œ€ํ•ด ์•Œ๋ ค์ค˜."
97
+
98
+ input_text = tokenizer.apply_chat_template(
99
+ conversation=[
100
+ {"role": "user", "content": input_text}
101
+ ],
102
+ add_generate_prompt=True,
103
+ return_tensors="pt"
104
+ ).to(model.device)
105
+
106
+ outputs = model.generate(input_text, max_new_tokens=512, repetition_penalty = 1.5)
107
+ print(tokenizer.decode(outputs[0], skip_special_tokens=False))
108
+ ```
109
+
110
+ ## Training information
111
+ ํ•™์Šต์€ ๊ตฌ๊ธ€ ์ฝ”๋žฉ L4 Single GPU๋ฅผ ํ™œ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.
112
+
113
+ | ๊ตฌ๋ถ„ | ๋‚ด์šฉ |
114
+ |-----------------------------|------------------|
115
+ | ํ™˜๊ฒฝ | Google Colab |
116
+ | GPU | L4(22.5GB) |
117
+ | ์‚ฌ์šฉ VRAM | ์•ฝ 13.8GB |
118
+ | dtype | bfloat16 |
119
+ | Attention | flash attention2 |
120
+ | Tuning | Lora(r=4, alpha=32) |
121
+ | Learning Rate | 1e-4 |
122
+ | LRScheduler | Cosine |
123
+ | Optimizer | adamw_torch_fused |
124
+ | batch_size | 4 |
125
+ | gradient_accumulation_steps | 2 |