Kiet Bui
commited on
Commit
β’
a957073
1
Parent(s):
c9712ee
initial commit
Browse files- README.md +171 -46
- added_tokens.json +3 -0
- config.json +25 -0
- generation_config.json +9 -0
- model-00001-of-00003.safetensors +3 -0
- model-00002-of-00003.safetensors +3 -0
- model-00003-of-00003.safetensors +3 -0
- model.safetensors.index.json +298 -0
- special_tokens_map.json +24 -0
- tokenizer.model +3 -0
- tokenizer_config.json +36 -0
README.md
CHANGED
@@ -6,55 +6,86 @@ language:
|
|
6 |
---
|
7 |
|
8 |
<p align="center"> <img src="vbd_logo.png" width="600" /> </p>
|
9 |
-
VBD-LLaMA2-Chat - a Conversationally-tuned
|
10 |
|
11 |
-
|
12 |
|
13 |
-
|
14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
|
16 |
-
Model weights:
|
17 |
-
- VBD-LLaMA2-7B-50b-Chat: a snapshot demonstrating the efficacy of the proposed methodology. This base model is pretrained on 50B Vietnamese tokens and 20B English tokens and SFT on XXXX samples.
|
18 |
|
19 |
<blockquote style="color:red"> <p><strong style="color: red">Terms of Use and License</strong>: By using our released weights, you agree to and comply with the terms and conditions specified in Meta's LLaMA-2 license.</blockquote>
|
20 |
|
21 |
|
22 |
|
23 |
-
Disclaimer:
|
24 |
|
25 |
|
|
|
26 |
|
27 |
-
|
28 |
|
29 |
-
Evaluation
|
30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
|
32 |
-
We evaluated our model via peer comparison on multiple publicly available dataset using
|
33 |
-
<a href="https://github.com/hieunguyen1053/lm-evaluation-harness"> @hieunguyen1053 fork of lm-evaluation-harness </a>. The models are benchmark on different task and metrics. The results are below:
|
34 |
|
|
|
35 |
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
|
44 |
-
| VLSP | hoa-7b | ~7B | 0,2855 | 0,4329 | 22,6466 | 0,2536 | 25,5126 | 0,4542 | 0,2705 | | 0,4509 |
|
45 |
-
| VBD | VBD-LLaMA2-7B-50b | ~7B | 0,3222 | 0,5195 | 13,033 | 0,2964 | | 0,4614 | 0,3197 | | 0,4764 |
|
46 |
-
| VBD | VBD-LLaMA2-7B-50b-Chat | ~7B | 0,3585 | 0,5207 | 13,419 | 0,3444 | 24,1 | 0,5179 | 0,3299 | 0,5792 | 0,4772 |
|
47 |
-
| AISingapore | Sealion7b | ~7B | 0,2692 | 0,483 | 16,4388 | 0,267 | | 0,4275 | 0,2725 | | 0,4277 |
|
48 |
-
| BK Lab | LLaMa-2-BK | ~7B | 0,2966 | 0,4402 | 25,613 | 0,3402 | | 0,4528 | 0,2971 | | 0,4655 |
|
49 |
-
| Meta | LLaMa-2 | ~7B | 0,3034 | 0,4287 | | 0,3067 | | | | | |
|
50 |
-
| BigScience | Bloom | ~7B | 0,337 | 0,483 | | 0,281 | | | | | |
|
51 |
-
| FPT | FPT GenAI | | 0,3581 | 0,5055 | | 0,3143 | | | | | |
|
52 |
-
| VinAI | PhoGPT SFT | ~7B | 0,2684 | 0,4109 | 55,509 | 0,2499 | | 0,478 | 0,2643 | | 0,4198 |
|
53 |
|
54 |
|
|
|
55 |
|
|
|
56 |
|
|
|
|
|
|
|
57 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
58 |
|
59 |
| Organization | Model | Model size | ARC (ACC) | HellaSwag (ACC) | LAMBADA (perplexity) | MMLU (ACC) |
|
60 |
| ------------ | ------------------ | ---------- | --------- | --------------- | -------------------- | ---------- |
|
@@ -67,39 +98,133 @@ We evaluated our model via peer comparison on multiple publicly available datase
|
|
67 |
| Meta | LLaMA-2-Chat | ~7B | 0,442 | 0,7547 | 3,968 | 0,4832 |
|
68 |
| AISingapore | Sealion7b | ~7B | 0,3422 | 0,6705 | 6,715 | 0,268 |
|
69 |
| VBD | VBD-LLaMA2-7B-50b-Chat | ~7B | 0,4556 | 0,7384 | 4,645 | 0,4558 |
|
|
|
70 |
|
|
|
71 |
|
72 |
-
|
73 |
|
74 |
-
|
75 |
-
TO_BE_FILLED
|
76 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
77 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
78 |
|
79 |
-
|
80 |
-
|
81 |
|
82 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
83 |
|
84 |
-
|
85 |
|
86 |
-
|
87 |
|
88 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
89 |
|
90 |
-
|
91 |
-
|
|
|
|
|
|
|
|
|
92 |
|
|
|
|
|
93 |
|
94 |
-
|
95 |
-
TO_BE_FILLED
|
96 |
|
97 |
-
|
98 |
-
TO_BE_FILLED
|
99 |
|
100 |
-
|
101 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
102 |
|
103 |
-
|
104 |
-
If you find our project useful, we hope you would kindly star our repo and cite our work as follows: Corresponding Author: v.quangph3@vinbigdata.com, v.kietbs@vinbigdata.com, v.minhtt32@vinbigdata.com
|
105 |
|
|
|
|
|
|
|
|
|
|
6 |
---
|
7 |
|
8 |
<p align="center"> <img src="vbd_logo.png" width="600" /> </p>
|
9 |
+
<h1>VBD-LLaMA2-Chat - a Conversationally-tuned LLaMA2 for Vietnamese</h1>
|
10 |
|
11 |
+
(*Disclaimer 1: VBD-LLaMA family is an effort by VinBigData to support and promote research on LLM in Vietnam. This model is not related to the ViGPT/ViViChat or any other product operating at VinBigData*)
|
12 |
|
13 |
+
We release VBD-LLaMA2-7B-Chat, a finetuned model based on Meta's LLaMA2-7B specifically for the Vietnamese π»π³ language. This is part of our effort to support the community in building Vietnamese Large Language Models (LLMs). The pretrained weight for this model was trained through continuous self-supervised learning (SSL) by extending LLaMA2's vocab on a corpus consisting of 100 billion Vietnamese π»π³ tokens and 40 billion English π¬π§ tokens. This approach attempts to leverage the full potential of existing language models and adapt them to lower resource languages, thereby reduce the hardware, time, and data cost associated building LLMs for these languages. Subsequent supervised finetuning (SFT) was conducted using our internal SFT dataset, which consists of 2 million Vietnamese samples.
|
14 |
|
15 |
+
For this release, we are only including the pretrained weight and the SFT weight of our model's checkpoint, which was trained on 40b Vietnamese and 16b English tokens (56b tokens total).
|
16 |
+
|
17 |
+
<h3>Model weights:</h3>
|
18 |
+
|
19 |
+
- VBD-LLaMA2-7B-50b: the snapshot of the pretrained model after 40b Vietnamese tokens and 16b Enlgish tokens ((~50b tokens total))
|
20 |
+
- VBD-LLaMA2-7B-50b-Chat: a snapshot demonstrating the efficacy of the proposed methodology. This base model is pretrained on 40b Vietnamese tokens and 16b English tokens and SFT on 2 million samples.
|
21 |
|
|
|
|
|
22 |
|
23 |
<blockquote style="color:red"> <p><strong style="color: red">Terms of Use and License</strong>: By using our released weights, you agree to and comply with the terms and conditions specified in Meta's LLaMA-2 license.</blockquote>
|
24 |
|
25 |
|
26 |
|
27 |
+
**Disclaimer 2: While we have made considerable efforts to minimize misleading, inaccurate, and harmful content generation, it's important to acknowledge that our released model carries inherent risks. We strongly recommend utilizing this model exclusively within a closely supervised environment and/or conducting additional testing, red teaming, and alignment procedures. The utilization of this model must adhere to and comply with local governance and regulations. The authors of this model shall not be held liable for any claims, damages, or other liabilities arising from the use of the released weights..**
|
28 |
|
29 |
|
30 |
+
<h3>Pre-training Proposal</h3>
|
31 |
|
32 |
+
We propose to do continued pretraining of the 3/7/13 billion parameters large language models (LLaMA, Bloom, MPT, Falcon, etc) for the Vietnamese and English languages.
|
33 |
|
|
|
34 |
|
35 |
+
Our proposal involves conducting experiments to enhance the conversational capabilities of this model in Vietnamese while retaining its abilities in English. This will be achieved by transferring knowledge from the English latent space to the Vietnamese latent space.
|
36 |
+
|
37 |
+
|
38 |
+
Instead of training a Vietnamese LLM from scratch, we want to leverage the full potential of existing language models (in English) and transform it into Vietnamese. We aim to reduce hardware costs, time, and data in building language models for Vietnamese.
|
39 |
+
|
40 |
+
|
41 |
+
We intend to augment the original latent space of LLaMA/Bloom LLM by incorporating a Vietnamese latent space. We will then transfer knowledge between these two spaces and fine-tune self-supervised learning (SSL) using both English and Vietnamese unsupervised corpora.
|
42 |
+
|
43 |
+
|
44 |
+
With this model, we expect to make a significant contribution to the development of large language models in Vietnam, making it easier for Vietnamese people to access larger language models in-house. It will create a recipe for other low-resource languages to follow as well.
|
45 |
+
|
46 |
+
|
47 |
+
**Vietnamese language, methods, and research objectives**
|
48 |
+
|
49 |
+
|
50 |
+
We experiment adding the Vietnamese language into large language models that do not originally support Vietnamese. Our hypothesis is that is is feasible to transfer knowledge transfer between different languages utilizing the cross-lingual capabilities of large models to quickly develop a Vietnamese Language Model (LLM) with less training time, data, and computational resources.
|
51 |
|
|
|
|
|
52 |
|
53 |
+
**Our proposed methods:**
|
54 |
|
55 |
+
1. We will start with a English/multilingual large language model:
|
56 |
+
+ https://huggingface.co/meta-llama/Llama-2-7b-hf
|
57 |
+
2. We will rebuild the BPE-based tokenizers by preserving the original tokens and incorporating Vietnamese syllables.
|
58 |
+
3. We will transfer knowledge in the latent space by fine-tuning the `added latent space while freezing the original latent space. This step is conducted by using the En-Vi and Vi-En translation tasks.
|
59 |
+
4. Using the new latent space (original latent space + added latent space), we will fine-tune self-supervised learning (SSL) using 40B English tokens and 100B Vietnamese tokens of unsupervised corpora. (the number of tokens as the recent well-performing LLaMA models - of around 1-1.5T tokens.)
|
60 |
+
+ In this step, we use a special strategy called hybrid training. This allows the model to have better zero-shot/few-shot capabilities even if the model has not been SFT trained. This also enhance the model's capability to understand prompts with limited SFT.
|
61 |
+
5. The training time for the 3B model is roughly 8k GPU hours (roughly 44 days on GPU DGX 8 A100s 40GB), and 16k GPU hours for the 7B model (roughly 84 days on GPU DGX 8 A100s 40GB).
|
62 |
+
6. We will evaluate the model periodically to observe improvents and/or the possibility of early completion of the training progress.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
63 |
|
64 |
|
65 |
+
<h3>Self-supervised Fine-Tuning (SFT)</h3>
|
66 |
|
67 |
+
We believe that Conversational-AI will be a significant interface for human-machine interaction in the next few years. Therefore, VBD-LLaMA2-7B-50b-Chat is finetuned on 2 million conversational data, in hopes that there will be more applications of LLMs in conversational systems in the near future.
|
68 |
|
69 |
+
In the following section, we document some of the benchmark of the released weight(s).
|
70 |
+
|
71 |
+
<h3>Evaluation</h3>
|
72 |
|
73 |
+
We evaluated our model via peer comparison on multiple publicly available dataset using
|
74 |
+
<a href="https://github.com/hieunguyen1053/lm-evaluation-harness"> @hieunguyen1053 fork of lm-evaluation-harness </a>
|
75 |
+
, and combine the results with that provided by the authors of VinaLLaMA. The results are bellow:
|
76 |
+
| Model | Model size | arc_vi (acc) | hellaswag_vi (acc) | mmlu_vi (acc) | truthfulqa_vi (acc) | Average |
|
77 |
+
| ---------------------- | ---------- | ------------ | ------------------ | ------------- | ------------------- | ------- |
|
78 |
+
| URA-LLaMA-13B | | 0,3752 | 0,4830 | 0,3973 | 0,4574 | 0,4282 |
|
79 |
+
| BLOOMZ-7B | | 0,3205 | 0,4930 | 0,3975 | 0,4523 | 0,4158 |
|
80 |
+
| PhoGPT-7B5-Instruct | | 0,2470 | 0,2578 | 0,2413 | 0,4759 | 0,3055 |
|
81 |
+
| SeaLLM-7B-chat | | 0,3607 | 0,5112 | 0,3339 | 0,4948 | 0,4252 |
|
82 |
+
| Vietcuna-7b-v3 | | 0,3419 | 0,4939 | 0,3354 | 0,4807 | 0,4130 |
|
83 |
+
| VinaLLaMA-2.7B-chat | | 0,3273 | 0,4814 | 0,3051 | 0,4972 | 0,4028 |
|
84 |
+
| VinaLLaMA-7B-chat | | 0,4239 | 0,5407 | 0,3932 | 0,5251 | 0,4707 |
|
85 |
+
| VBD-LLaMA2-7B-50b | | 0,3222 | 0,5195 | 0,2964 | 0,4614 | 0,3999 |
|
86 |
+
| VBD-LLaMA2-7B-50b-Chat | | 0,3585 | 0,5207 | 0,3444 | 0,5179 | 0,4354 |
|
87 |
+
|
88 |
+
<p align="center"> Table 1. Benchmark on Vietnamese datasets </p>
|
89 |
|
90 |
| Organization | Model | Model size | ARC (ACC) | HellaSwag (ACC) | LAMBADA (perplexity) | MMLU (ACC) |
|
91 |
| ------------ | ------------------ | ---------- | --------- | --------------- | -------------------- | ---------- |
|
|
|
98 |
| Meta | LLaMA-2-Chat | ~7B | 0,442 | 0,7547 | 3,968 | 0,4832 |
|
99 |
| AISingapore | Sealion7b | ~7B | 0,3422 | 0,6705 | 6,715 | 0,268 |
|
100 |
| VBD | VBD-LLaMA2-7B-50b-Chat | ~7B | 0,4556 | 0,7384 | 4,645 | 0,4558 |
|
101 |
+
<p align="center"> Table 2. Benchmark on English datasets </p>
|
102 |
|
103 |
+
Based on this results, our model performs on-par or better than most models for tasks in Vietnamese and demonstrate that this approach is extremely potential.
|
104 |
|
105 |
+
Pretraining loss:
|
106 |
|
107 |
+
<p align="left"> <img src="loss.png" width="500" /> </p>
|
|
|
108 |
|
109 |
+
<h3> Run the model </h3>
|
110 |
+
|
111 |
+
<h4> with Huggingface's transformers </h4>
|
112 |
+
|
113 |
+
```python
|
114 |
+
import torch
|
115 |
+
from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
|
116 |
+
|
117 |
+
model_path = "LR-AI-Labs/vbd-llama2-7B-50b-chat"
|
118 |
|
119 |
+
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
|
120 |
+
model = AutoModelForCausalLM.from_pretrained(
|
121 |
+
model_path, torch_dtype=torch.bfloat16,
|
122 |
+
device_map='auto',
|
123 |
+
# load_in_8bit=True
|
124 |
+
)
|
125 |
+
model.eval()
|
126 |
|
127 |
+
SYS_PROMPT = "A chat between a curious user and an artificial intelligence assistant. "\
|
128 |
+
"The assistant gives helpful, detailed, and polite answers to the user's questions."
|
129 |
|
130 |
+
def response_generate(input_prompt):
|
131 |
+
input_ids = tokenizer(input_prompt, return_tensors="pt")
|
132 |
+
outputs = model.generate(
|
133 |
+
inputs=input_ids["input_ids"].to("cuda"),
|
134 |
+
attention_mask=input_ids["attention_mask"].to("cuda"),
|
135 |
+
do_sample=True,
|
136 |
+
temperature=0.7,
|
137 |
+
top_k=50,
|
138 |
+
top_p=0.9,
|
139 |
+
max_new_tokens=1024,
|
140 |
+
eos_token_id=tokenizer.eos_token_id,
|
141 |
+
pad_token_id=tokenizer.pad_token_id
|
142 |
+
)
|
143 |
+
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
|
144 |
+
response = response.split("ASSISTANT:")[-1].strip()
|
145 |
+
return response
|
146 |
|
147 |
+
print(response_generate(f"{SYS_PROMPT} USER: Xin chΓ o, bαΊ‘n lΓ ai? ASSISTANT:"))
|
148 |
|
149 |
+
# Xin chΓ o, ViVi lΓ mα»t trợ lΓ½ trΓ tuα» nhΓ’n tαΊ‘o cΓ³ thα» trαΊ£ lα»i cΓ’u hα»i của bαΊ‘n vΓ trΓ² chuyα»n vα»i bαΊ‘n.
|
150 |
|
151 |
+
```
|
152 |
+
|
153 |
+
<h5>For single-turn chat:</h5>
|
154 |
+
|
155 |
+
```python
|
156 |
+
print(response_generate(f"{SYS_PROMPT} USER: CΓ‘ch Δα» nαΊ₯u mΓ³n phα» ngon ASSISTANT:"))
|
157 |
+
```
|
158 |
+
|
159 |
+
```python
|
160 |
+
print(response_generate(f"{SYS_PROMPT} USER: ViαΊΏt cho tΓ΄i mα»t email xin sαΊΏp tΔng lΖ°Ζ‘ng ASSISTANT:"))
|
161 |
+
```
|
162 |
+
|
163 |
+
```python
|
164 |
+
print(response_generate(f'''{SYS_PROMPT} USER: TrαΊ£ lα»i cΓ’u hα»i dα»±a vΓ o thΓ΄ng tin lαΊ₯y tα»« cΓ‘c ΔoαΊ‘n vΔn sau. NαΊΏu khΓ΄ng cΓ³ thΓ΄ng tin trαΊ£ lα»i "ThαΊt tiαΊΏc, cΓ’u hα»i của bαΊ‘n Vivi khΓ΄ng biαΊΏt." NαΊΏu cΓ³ cΓ’u trαΊ£ lα»i, yΓͺu cαΊ§u trαΊ£ ra cαΊ£ nguα»n chα»©a cΓ’u trαΊ£ lα»i dαΊ‘ng "[ΔoαΊ‘n k]."
|
165 |
+
CΓ’u hα»i: {question}
|
166 |
+
ΔoαΊ‘n 0: {chunk_0}
|
167 |
+
ΔoαΊ‘n 1: {chunk_1}
|
168 |
+
ΔoαΊ‘n 2: {chunk_2}
|
169 |
+
ΔoαΊ‘n 3: {chunk_3}
|
170 |
+
ΔoαΊ‘n 4: {chunk_4} ASSISTANT:'''))
|
171 |
+
```
|
172 |
+
|
173 |
+
<h5>For conversation chat:</h5>
|
174 |
+
|
175 |
+
```python
|
176 |
+
turns = []
|
177 |
+
|
178 |
+
while True:
|
179 |
+
user_input = input("USER: ")
|
180 |
+
turns.append("USER: " + user_input)
|
181 |
+
bot_output = "ASSISTANT: " + response_generate("%s %s ASSISTANT:" %(SYS_PROMPT, " ".join(turns)))
|
182 |
+
print(bot_output)
|
183 |
+
turns.append(bot_output)
|
184 |
+
|
185 |
+
# USER: Xin chΓ o
|
186 |
+
# ASSISTANT: Xin chΓ o. LΓ m thαΊΏ nΓ o ViVi cΓ³ thα» giΓΊp bαΊ‘n?
|
187 |
+
# USER: Anh bα» Δau ΔαΊ§u quΓ‘
|
188 |
+
# ASSISTANT: BαΊ‘n cΓ³ thα» nΓ³i vα»i ViVi nhiα»u hΖ‘n vα» cΖ‘n Δau ΔαΊ§u của bαΊ‘n?
|
189 |
+
# USER: anh bα» Δau nα»a ΔαΊ§u vα» ΔΓͺm, sΓ‘ng dαΊy khΓ΄ng tα»nh tΓ‘o
|
190 |
+
# ASSISTANT: Δiα»u quan trα»ng lΓ bαΊ‘n nhαΊn thα»©c Δược rαΊ±ng cΖ‘n Δau ΔαΊ§u của bαΊ‘n khΓ΄ng phαΊ£i lΓ Δiα»u hiαΊΏm gαΊ·p vΓ nΓ³ cΓ³ thα» cΓ³ nhiα»u nguyΓͺn nhΓ’n. Mα»t sα» nguyΓͺn nhΓ’n phα» biαΊΏn của cΖ‘n Δau ΔαΊ§u lΓ cΔng thαΊ³ng, thiαΊΏu ngủ, cΔng thαΊ³ng, chαΊΏ Δα» Δn uα»ng kΓ©m vΓ mαΊ₯t nΖ°α»c.
|
191 |
+
|
192 |
+
# NαΊΏu cΖ‘n Δau ΔαΊ§u của bαΊ‘n khΓ΄ng giαΊ£m Δi sau khi nghα» ngΖ‘i vΓ nghα» ngΖ‘i, Δiα»u quan trα»ng lΓ bαΊ‘n nΓ³i chuyα»n vα»i bΓ‘c sΔ© của bαΊ‘n. BΓ‘c sΔ© của bαΊ‘n cΓ³ thα» giΓΊp bαΊ‘n xΓ‘c Δα»nh nguyΓͺn nhΓ’n của cΖ‘n Δau ΔαΊ§u của bαΊ‘n vΓ Δα» xuαΊ₯t cΓ‘c phΖ°Ζ‘ng phΓ‘p Δiα»u trα» thΓch hợp.
|
193 |
+
# USER: cΓ³ cΓ‘ch nΓ o giαΊ£m Δau tαΊ‘i nhΓ khΓ΄ng?
|
194 |
+
# ASSISTANT: CΓ³, cΓ³ mα»t sα» cΓ‘ch bαΊ‘n cΓ³ thα» thα» giαΊ£m Δau ΔαΊ§u tαΊ‘i nhΓ :
|
195 |
|
196 |
+
# - Uα»ng nhiα»u nΖ°α»c Δα» giαΊ£m Δau ΔαΊ§u do mαΊ₯t nΖ°α»c.
|
197 |
+
# - Nghα» ngΖ‘i vΓ ngủ nhiα»u hΖ‘n.
|
198 |
+
# - Γp dα»₯ng bΔng lαΊ‘nh hoαΊ·c nΓ³ng vΓ o khu vα»±c Δau.
|
199 |
+
# - Γp dα»₯ng thuα»c giαΊ£m Δau khΓ΄ng kΓͺ ΔΖ‘n nhΖ° ibuprofen hoαΊ·c paracetamol.
|
200 |
+
# - Thα»±c hiα»n cΓ‘c kα»Ή thuαΊt thΖ° giΓ£n nhΖ° thiα»n, thα» sΓ’u hoαΊ·c yoga.
|
201 |
+
# - Massage khu vα»±c bα» αΊ£nh hΖ°α»ng.
|
202 |
|
203 |
+
# Δiα»u quan trα»ng cαΊ§n nhα» lΓ trong khi cΓ‘c biα»n phΓ‘p khαΊ―c phα»₯c tαΊ‘i nhΓ cΓ³ thα» giΓΊp giαΊ£m Δau ΔαΊ§u, chΓΊng khΓ΄ng thay thαΊΏ cho lα»i khuyΓͺn y tαΊΏ chuyΓͺn nghiα»p. NαΊΏu cΖ‘n Δau ΔαΊ§u của bαΊ‘n vαΊ«n tα»n tαΊ‘i hoαΊ·c trα» nΓͺn tα»i tα» hΖ‘n, Δiα»u quan trα»ng lΓ bαΊ‘n nΓ³i chuyα»n vα»i bΓ‘c sΔ© của bαΊ‘n.
|
204 |
+
```
|
205 |
|
206 |
+
***Modify the parameters "temperature", "top_k", "top_p" to suit your usecase.***
|
|
|
207 |
|
208 |
+
<h3>Limitations and Future Research</h3>
|
|
|
209 |
|
210 |
+
The published model has certain limitations. For example, it performs poorly on tasks involving reasoning, coding or mathematics. In addition, the model will occasionally produce harmful, biased responses, or answer unsafe questions. Users should be cautious while interacting with VBD-LLaMA2-7B-50b-Chat and verify important information taken from the model's outputs because such infomation can be factually incorrect.
|
211 |
+
|
212 |
+
This model has been trained on and exhibits decent capability to tackle Vietnamese tasks, especially those associated with conversations. However, the model still struggles with questions related to Vietnamese history, culture, and society. We recommend some approaches to further improve this model:
|
213 |
+
|
214 |
+
+ Data Distillation: Construct a small dataset of local/in-domain knowledge to continuously train the model. You might find great ideas searching through the topic of domain adaptation too ;)
|
215 |
+
+ Merging/Combining/Ensembling Models: There have been numerous models developed based on Meta's LLaMA, so another approach might be to a training process similar to knowledge distilation, where the teacher consists of combinations of previously trained models.
|
216 |
+
+ RLHF/Alignment: The model has not been trained with RFHF or alignment techniques such as DPO.
|
217 |
+
+ Retrieval Augmented Generation (RAG): Combine the model with external knowledge sources.
|
218 |
+
|
219 |
+
<h3>Acknowledgements:</h3>
|
220 |
+
|
221 |
+
We would like to express our gratitude towards the Virtual Assistant Technology Center at VinBigData JSC. led by Dr. <a href="https://scholar.google.com.vn/citations?user=z3IDeu0AAAAJ&hl=vi"> Kim Anh Nguyen </a> for providing us with the necessary resources to deliver this project. We are also greatly indebted to our fellow colleagues at the Natural Language Processing Department at VinBigData, whose feedbacks and expertise had been of great help.
|
222 |
+
|
223 |
+
<h3>Citation</h3>
|
224 |
|
225 |
+
If you find our project useful, we hope you would kindly star our repo and cite our work as follows:
|
|
|
226 |
|
227 |
+
Corresponding Author:
|
228 |
+
+ v.quangph3@vinbigdata.com ([QuangPH](https://samsonph.github.io/))
|
229 |
+
+ v.kietbs@vinbigdata.com ([KietBS](https://github.com/ntdas/))
|
230 |
+
+ v.minhtt32@vinbigdata.com ([MinhTT](https://github.com/tanminhtran168/))
|
added_tokens.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"<pad>": 49380
|
3 |
+
}
|
config.json
ADDED
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"architectures": [
|
3 |
+
"LlamaForCausalLM"
|
4 |
+
],
|
5 |
+
"bos_token_id": 1,
|
6 |
+
"eos_token_id": 2,
|
7 |
+
"hidden_act": "silu",
|
8 |
+
"hidden_size": 4096,
|
9 |
+
"initializer_range": 0.02,
|
10 |
+
"intermediate_size": 11008,
|
11 |
+
"max_position_embeddings": 4096,
|
12 |
+
"model_type": "llama",
|
13 |
+
"num_attention_heads": 32,
|
14 |
+
"num_hidden_layers": 32,
|
15 |
+
"num_key_value_heads": 32,
|
16 |
+
"pad_token_id": 0,
|
17 |
+
"pretraining_tp": 1,
|
18 |
+
"rms_norm_eps": 1e-05,
|
19 |
+
"rope_scaling": null,
|
20 |
+
"tie_word_embeddings": false,
|
21 |
+
"torch_dtype": "bfloat16",
|
22 |
+
"transformers_version": "4.28.1",
|
23 |
+
"use_cache": false,
|
24 |
+
"vocab_size": 49381
|
25 |
+
}
|
generation_config.json
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_from_model_config": true,
|
3 |
+
"bos_token_id": 1,
|
4 |
+
"eos_token_id": 2,
|
5 |
+
"pad_token_id": 49380,
|
6 |
+
"temperature": 0.9,
|
7 |
+
"top_p": 0.6,
|
8 |
+
"transformers_version": "4.28.1"
|
9 |
+
}
|
model-00001-of-00003.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:26e3e9c053a15d8edca50534cd4937c8bede6f6aa5702ea6a3404beeb97f07b2
|
3 |
+
size 4991192856
|
model-00002-of-00003.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4c83c378688e7223cad5f86c43480fa5ada5ce8cfab68fbd8f5a74456ffdc763
|
3 |
+
size 4947390888
|
model-00003-of-00003.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:df6bff780b430de99d1bdbdc76e7f2b5d0d0fa0d0f8efc63cead51af51dcc148
|
3 |
+
size 3823051624
|
model.safetensors.index.json
ADDED
@@ -0,0 +1,298 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"metadata": {
|
3 |
+
"total_size": 13761601536
|
4 |
+
},
|
5 |
+
"weight_map": {
|
6 |
+
"lm_head.weight": "model-00003-of-00003.safetensors",
|
7 |
+
"model.embed_tokens.weight": "model-00001-of-00003.safetensors",
|
8 |
+
"model.layers.0.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
9 |
+
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
10 |
+
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
11 |
+
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
12 |
+
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
13 |
+
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
14 |
+
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
15 |
+
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
16 |
+
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
17 |
+
"model.layers.1.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
18 |
+
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
19 |
+
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
20 |
+
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
21 |
+
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
22 |
+
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
23 |
+
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
24 |
+
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
25 |
+
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
26 |
+
"model.layers.10.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
27 |
+
"model.layers.10.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
28 |
+
"model.layers.10.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
29 |
+
"model.layers.10.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
30 |
+
"model.layers.10.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
31 |
+
"model.layers.10.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
32 |
+
"model.layers.10.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
33 |
+
"model.layers.10.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
34 |
+
"model.layers.10.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
35 |
+
"model.layers.11.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
36 |
+
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
37 |
+
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
38 |
+
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
39 |
+
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
40 |
+
"model.layers.11.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
41 |
+
"model.layers.11.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
42 |
+
"model.layers.11.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
43 |
+
"model.layers.11.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
44 |
+
"model.layers.12.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
45 |
+
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
46 |
+
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
47 |
+
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
48 |
+
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
49 |
+
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
50 |
+
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
51 |
+
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
52 |
+
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
53 |
+
"model.layers.13.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
54 |
+
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
55 |
+
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
56 |
+
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
57 |
+
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
58 |
+
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
59 |
+
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
60 |
+
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
61 |
+
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
62 |
+
"model.layers.14.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
63 |
+
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
64 |
+
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
65 |
+
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
66 |
+
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
67 |
+
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
68 |
+
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
69 |
+
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
70 |
+
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
71 |
+
"model.layers.15.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
72 |
+
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
73 |
+
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
74 |
+
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
75 |
+
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
76 |
+
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
77 |
+
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
78 |
+
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
79 |
+
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
80 |
+
"model.layers.16.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
81 |
+
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
82 |
+
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
83 |
+
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
84 |
+
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
85 |
+
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
86 |
+
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
87 |
+
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
88 |
+
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
89 |
+
"model.layers.17.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
90 |
+
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
91 |
+
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
92 |
+
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
93 |
+
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
94 |
+
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
95 |
+
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
96 |
+
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
97 |
+
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
98 |
+
"model.layers.18.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
99 |
+
"model.layers.18.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
100 |
+
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
101 |
+
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
102 |
+
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
103 |
+
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
104 |
+
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
105 |
+
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
106 |
+
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
107 |
+
"model.layers.19.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
108 |
+
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
109 |
+
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
110 |
+
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
111 |
+
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
112 |
+
"model.layers.19.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
113 |
+
"model.layers.19.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
114 |
+
"model.layers.19.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
115 |
+
"model.layers.19.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
116 |
+
"model.layers.2.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
117 |
+
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
118 |
+
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
119 |
+
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
120 |
+
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
121 |
+
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
122 |
+
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
123 |
+
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
124 |
+
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
125 |
+
"model.layers.20.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
126 |
+
"model.layers.20.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
127 |
+
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
128 |
+
"model.layers.20.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
129 |
+
"model.layers.20.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
130 |
+
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
131 |
+
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
132 |
+
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
133 |
+
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
134 |
+
"model.layers.21.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
135 |
+
"model.layers.21.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
136 |
+
"model.layers.21.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
137 |
+
"model.layers.21.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
138 |
+
"model.layers.21.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
139 |
+
"model.layers.21.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
140 |
+
"model.layers.21.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
141 |
+
"model.layers.21.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
142 |
+
"model.layers.21.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
143 |
+
"model.layers.22.input_layernorm.weight": "model-00002-of-00003.safetensors",
|
144 |
+
"model.layers.22.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
145 |
+
"model.layers.22.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
146 |
+
"model.layers.22.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
147 |
+
"model.layers.22.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
148 |
+
"model.layers.22.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
149 |
+
"model.layers.22.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
150 |
+
"model.layers.22.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
151 |
+
"model.layers.22.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
152 |
+
"model.layers.23.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
153 |
+
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
154 |
+
"model.layers.23.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
155 |
+
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
156 |
+
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
157 |
+
"model.layers.23.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
158 |
+
"model.layers.23.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
159 |
+
"model.layers.23.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
160 |
+
"model.layers.23.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
161 |
+
"model.layers.24.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
162 |
+
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
163 |
+
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
164 |
+
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
165 |
+
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
166 |
+
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
167 |
+
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
168 |
+
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
169 |
+
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
170 |
+
"model.layers.25.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
171 |
+
"model.layers.25.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
172 |
+
"model.layers.25.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
173 |
+
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
174 |
+
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
175 |
+
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
176 |
+
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
177 |
+
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
178 |
+
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
179 |
+
"model.layers.26.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
180 |
+
"model.layers.26.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
181 |
+
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
182 |
+
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
183 |
+
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
184 |
+
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
185 |
+
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
186 |
+
"model.layers.26.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
187 |
+
"model.layers.26.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
188 |
+
"model.layers.27.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
189 |
+
"model.layers.27.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
190 |
+
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
191 |
+
"model.layers.27.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
192 |
+
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
193 |
+
"model.layers.27.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
194 |
+
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
195 |
+
"model.layers.27.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
196 |
+
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
197 |
+
"model.layers.28.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
198 |
+
"model.layers.28.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
199 |
+
"model.layers.28.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
200 |
+
"model.layers.28.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
201 |
+
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
202 |
+
"model.layers.28.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
203 |
+
"model.layers.28.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
204 |
+
"model.layers.28.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
205 |
+
"model.layers.28.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
206 |
+
"model.layers.29.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
207 |
+
"model.layers.29.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
208 |
+
"model.layers.29.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
209 |
+
"model.layers.29.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
210 |
+
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
211 |
+
"model.layers.29.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
212 |
+
"model.layers.29.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
213 |
+
"model.layers.29.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
214 |
+
"model.layers.29.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
215 |
+
"model.layers.3.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
216 |
+
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
217 |
+
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
218 |
+
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
219 |
+
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
220 |
+
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
221 |
+
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
222 |
+
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
223 |
+
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
224 |
+
"model.layers.30.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
225 |
+
"model.layers.30.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
226 |
+
"model.layers.30.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
227 |
+
"model.layers.30.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
228 |
+
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
229 |
+
"model.layers.30.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
230 |
+
"model.layers.30.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
231 |
+
"model.layers.30.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
232 |
+
"model.layers.30.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
233 |
+
"model.layers.31.input_layernorm.weight": "model-00003-of-00003.safetensors",
|
234 |
+
"model.layers.31.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
235 |
+
"model.layers.31.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
236 |
+
"model.layers.31.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
237 |
+
"model.layers.31.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
238 |
+
"model.layers.31.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
239 |
+
"model.layers.31.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
240 |
+
"model.layers.31.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
241 |
+
"model.layers.31.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
242 |
+
"model.layers.4.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
243 |
+
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
244 |
+
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
245 |
+
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
246 |
+
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
247 |
+
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
248 |
+
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
249 |
+
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
250 |
+
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
251 |
+
"model.layers.5.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
252 |
+
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
253 |
+
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
254 |
+
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
255 |
+
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
256 |
+
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
257 |
+
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
258 |
+
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
259 |
+
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
260 |
+
"model.layers.6.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
261 |
+
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
262 |
+
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
263 |
+
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
264 |
+
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
265 |
+
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
266 |
+
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
267 |
+
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
268 |
+
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
269 |
+
"model.layers.7.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
270 |
+
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
271 |
+
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
272 |
+
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
273 |
+
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
274 |
+
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
275 |
+
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
276 |
+
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
277 |
+
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
278 |
+
"model.layers.8.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
279 |
+
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
280 |
+
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
281 |
+
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
282 |
+
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
283 |
+
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
284 |
+
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
285 |
+
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
286 |
+
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
287 |
+
"model.layers.9.input_layernorm.weight": "model-00001-of-00003.safetensors",
|
288 |
+
"model.layers.9.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
289 |
+
"model.layers.9.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
290 |
+
"model.layers.9.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
291 |
+
"model.layers.9.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
292 |
+
"model.layers.9.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
293 |
+
"model.layers.9.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
294 |
+
"model.layers.9.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
295 |
+
"model.layers.9.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
296 |
+
"model.norm.weight": "model-00003-of-00003.safetensors"
|
297 |
+
}
|
298 |
+
}
|
special_tokens_map.json
ADDED
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"bos_token": {
|
3 |
+
"content": "<s>",
|
4 |
+
"lstrip": false,
|
5 |
+
"normalized": true,
|
6 |
+
"rstrip": false,
|
7 |
+
"single_word": false
|
8 |
+
},
|
9 |
+
"eos_token": {
|
10 |
+
"content": "</s>",
|
11 |
+
"lstrip": false,
|
12 |
+
"normalized": true,
|
13 |
+
"rstrip": false,
|
14 |
+
"single_word": false
|
15 |
+
},
|
16 |
+
"pad_token": "<unk>",
|
17 |
+
"unk_token": {
|
18 |
+
"content": "<unk>",
|
19 |
+
"lstrip": false,
|
20 |
+
"normalized": true,
|
21 |
+
"rstrip": false,
|
22 |
+
"single_word": false
|
23 |
+
}
|
24 |
+
}
|
tokenizer.model
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:450f5e8ed9b73ec6e9c31822ac667f93451914e8f80a5ef5a3be71916e44c506
|
3 |
+
size 794744
|
tokenizer_config.json
ADDED
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"add_bos_token": true,
|
3 |
+
"add_eos_token": false,
|
4 |
+
"bos_token": {
|
5 |
+
"__type": "AddedToken",
|
6 |
+
"content": "<s>",
|
7 |
+
"lstrip": false,
|
8 |
+
"normalized": true,
|
9 |
+
"rstrip": false,
|
10 |
+
"single_word": false
|
11 |
+
},
|
12 |
+
"clean_up_tokenization_spaces": false,
|
13 |
+
"eos_token": {
|
14 |
+
"__type": "AddedToken",
|
15 |
+
"content": "</s>",
|
16 |
+
"lstrip": false,
|
17 |
+
"normalized": true,
|
18 |
+
"rstrip": false,
|
19 |
+
"single_word": false
|
20 |
+
},
|
21 |
+
"legacy": false,
|
22 |
+
"model_max_length": 4096,
|
23 |
+
"pad_token": null,
|
24 |
+
"padding_side": "right",
|
25 |
+
"sp_model_kwargs": {},
|
26 |
+
"tokenizer_class": "LlamaTokenizer",
|
27 |
+
"unk_token": {
|
28 |
+
"__type": "AddedToken",
|
29 |
+
"content": "<unk>",
|
30 |
+
"lstrip": false,
|
31 |
+
"normalized": true,
|
32 |
+
"rstrip": false,
|
33 |
+
"single_word": false
|
34 |
+
},
|
35 |
+
"use_fast": true
|
36 |
+
}
|