oxygen65 commited on
Commit
e876caa
1 Parent(s): 4f146e5

add document for inference.

Browse files
Files changed (1) hide show
  1. README.md +198 -5
README.md CHANGED
@@ -1,22 +1,215 @@
1
  ---
2
- base_model: oxygen65/llm-jp-3-13b-finetune-2
 
 
3
  tags:
4
  - text-generation-inference
5
  - transformers
6
  - unsloth
7
  - llama
8
  - trl
9
- license: apache-2.0
10
  language:
11
- - en
 
 
12
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
  # Uploaded model
15
 
16
  - **Developed by:** oxygen65
17
  - **License:** apache-2.0
18
- - **Finetuned from model :** oxygen65/llm-jp-3-13b-finetune-2
19
 
20
  This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
22
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
1
  ---
2
+ base_model:
3
+ - oxygen65/llm-jp-3-13b-finetune-2
4
+ - llm-jp/llm-jp-3-13b
5
  tags:
6
  - text-generation-inference
7
  - transformers
8
  - unsloth
9
  - llama
10
  - trl
11
+ license: cc-by-nc-sa-4.0
12
  language:
13
+ - ja
14
+ datasets:
15
+ - elyza/ELYZA-tasks-100
16
  ---
17
+ # How to Use
18
+
19
+ ## 1. load this model and tokenizer
20
+ ```python
21
+ from transformers import (
22
+ AutoModelForCausalLM,
23
+ AutoTokenizer,
24
+ BitsAndBytesConfig,
25
+ )
26
+ import torch
27
+ from tqdm import tqdm
28
+ import json
29
+
30
+ model_name = "oxygen65/llm-jp-3-13b-finetune-3"
31
+
32
+ # QLoRA config
33
+ bnb_config = BitsAndBytesConfig(
34
+ load_in_4bit=True,
35
+ bnb_4bit_quant_type="nf4",
36
+ bnb_4bit_compute_dtype=torch.bfloat16,
37
+ bnb_4bit_use_double_quant=False,
38
+ )
39
+
40
+ # Load model
41
+ model = AutoModelForCausalLM.from_pretrained(
42
+ model_name,
43
+ quantization_config=bnb_config,
44
+ device_map="auto",
45
+ )
46
+
47
+ # Load tokenizer
48
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
49
+ ```
50
+
51
+ ## 2. load Eval Datasets
52
+ ```python
53
+ tasks = []
54
+ with open("./elyza-tasks-100-TV_0.jsonl", "r") as f:
55
+ item = ""
56
+ for line in f:
57
+ line = line.strip()
58
+ item += line
59
+ if item.endswith("}"):
60
+ tasks.append(json.loads(item))
61
+ item = ""
62
+ ```
63
+
64
+ ## 3. set up retrievers
65
+ if you can't find "rank_bm25" python package in your environment
66
+
67
+ ```bash
68
+ !pip install rank_bm25
69
+ ```
70
+
71
+ ```python
72
+ from rank_bm25 import BM25Okapi
73
+ from nltk.tokenize import word_tokenize
74
+ import nltk
75
+ import numpy as np
76
+
77
+
78
+ # 必要なデータをダウンロード(初回のみ)
79
+ nltk.download('punkt')
80
+ nltk.download('punkt_tab')
81
+
82
+ def search_similar_documents_bm25(query, sample_tasks):
83
+ # トークン化(BM25はトークン化されたデータを要求します)
84
+ tokenized_documents = [word_tokenize(doc) for doc in sample_tasks['input']]
85
+
86
+ # BM25オブジェクトの作成
87
+ bm25 = BM25Okapi(tokenized_documents)
88
+
89
+ tokenized_query = word_tokenize(query)
90
+ # 類似度の計算
91
+ doc_scores = bm25.get_scores(tokenized_query)
92
+ # 類似度が高い順にソート
93
+ sorted_indexes = np.argsort(doc_scores)[::-1]
94
+
95
+ indexes = []
96
+ for i in range(len(doc_scores)):
97
+ if doc_scores[sorted_indexes[i]] < 20.0:
98
+ break
99
+ else:
100
+ indexes.append(sorted_indexes[i])
101
+
102
+ return indexes
103
+
104
+ from sentence_transformers import SentenceTransformer
105
+ from sklearn.metrics.pairwise import cosine_similarity
106
+ import numpy as np
107
+ SentTF = SentenceTransformer('all-MiniLM-L6-v2')
108
+ def seearch_similar_documents_neuralRetriver(query, sample_tasks):
109
+ global SentTF
110
+ emb1 = SentTF.encode([query])
111
+ emb2 = SentTF.encode(sample_tasks['input'])
112
+ # 全ての組み合わせで類似度を計算
113
+ similarity_matrix = cosine_similarity(emb1, emb2) #時間かかるので先に計算しておくべき
114
+ # 類似度が高い順にソート
115
+ sorted_indexes = np.argsort(similarity_matrix[0])[::-1]
116
+ #print(sorted_indexes)
117
+
118
+ indexes = []
119
+ for i in range(len(sample_tasks['input'])):
120
+ if similarity_matrix[0][sorted_indexes[i]] < 0.75:
121
+ break
122
+ else:
123
+ indexes.append(sorted_indexes[i])
124
+
125
+ return indexes
126
+
127
+ def create_icl_prompt(input, sample_tasks, task_id):
128
+ indexes_bm25 = search_similar_documents_bm25(input, sample_tasks)
129
+ indexes_neu = seearch_similar_documents_neuralRetriver(input, sample_tasks)
130
+ indexes = list(set(indexes_bm25 + indexes_neu))
131
+ icl_prompt = ""
132
+ if indexes == []:
133
+ return ""
134
+
135
+ icl_prompt = f"""## 例題\n"""
136
+ for i in range(len(indexes)):
137
+ icl_prompt += f"""### 指示
138
+ {sample_tasks["input"][indexes[i]]}
139
+ ### 回答
140
+ {sample_tasks["output"][indexes[i]]}
141
+ """
142
+ icl_prompt += f"""
143
+ ## 本題: 以下の指示に従って回答してください。step by stepで回答してください。
144
+ """
145
+ return icl_prompt
146
+
147
+ create_icl_prompt(tasks[2]["input"], sample_tasks, 0)
148
+ ```
149
+
150
+ ### 4. Inference
151
+ ```python
152
+ # llmjp
153
+ import re
154
+ pattern = r"^以下.*$"
155
+
156
+ # プロンプトの作成
157
+ sys_prompt = ""
158
+ icl_prompt = ""
159
+ results = []
160
+ loop = 0
161
+ for data in tqdm(tasks):
162
+ task_id = data["task_id"]
163
+ if task_id != 66 and task_id != 72:
164
+ continue
165
+ input = data["input"]
166
+ # in context learning用のプロンプト
167
+ icl_prompt = create_icl_prompt(input, sample_tasks, task_id)
168
+
169
+ prompt = f"""{sys_prompt}{icl_prompt}### 指示
170
+ {input}
171
+ ### 回答
172
+ """
173
+ # 1回目
174
+ tokenized_input = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt").to(model.device)
175
+ with torch.no_grad():
176
+ outputs = model.generate(
177
+ tokenized_input,
178
+ max_new_tokens=512,
179
+ do_sample=False,
180
+ repetition_penalty=1.2,
181
+ eos_token_id=tokenizer.eos_token_id,
182
+ )[0]
183
+ output = tokenizer.decode(outputs[tokenized_input.size(1):], skip_special_tokens=True)
184
+
185
+ while (True): #とりあえず出力。
186
+ line = output.splitlines()
187
+ if re.match(pattern, line[0]) and len(line) == 1:
188
+ print(f"#========================= Unexpected answer =========================#\n {line}")
189
+ outputs = model.generate(
190
+ tokenized_input,
191
+ max_new_tokens=512,
192
+ do_sample=True,
193
+ temperature=0.4,
194
+ repetition_penalty=1.2
195
+ )[0]
196
+ output = tokenizer.decode(outputs[tokenized_input.size(1):], skip_special_tokens=True)
197
+ else: break
198
+
199
+
200
+ results.append({"task_id": data["task_id"], "input": input, "output": output})
201
+
202
+ print(f"task_id: {data['task_id']}, prompt: {prompt}, output: {output}")
203
+
204
+ #break
205
+ ```
206
 
207
  # Uploaded model
208
 
209
  - **Developed by:** oxygen65
210
  - **License:** apache-2.0
211
+ - **Finetuned from model :** oxygen65/llm-jp-3-13b-finetune-2 (the original is llm-jp/llm-jp-3-13b)
212
 
213
  This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
214
 
215
+ [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)