Kiet Bui commited on
Commit
a957073
β€’
1 Parent(s): c9712ee

initial commit

Browse files
README.md CHANGED
@@ -6,55 +6,86 @@ language:
6
  ---
7
 
8
  <p align="center"> <img src="vbd_logo.png" width="600" /> </p>
9
- VBD-LLaMA2-Chat - a Conversationally-tuned Llama2 for Vietnamese
10
 
11
- We release VBD-LLaMA2-7B-Chat, a finetuned model based on Meta's LLaMA2-7B specifically for the Vietname πŸ‡»πŸ‡³ language, in an effort to support the community in building Vietnamese Large Language Models (LLMs). The pretrained weight for this model was trained with continous self-supervised learning (SSL) by extending LLaMA2's vocab on a corpus consisting of 100 billion Vietnamese πŸ‡»πŸ‡³ tokens and 40 billion English πŸ‡¬πŸ‡§ tokens. This approach attempts to leverage the full potential of existing language models and adapt them to lower resource languages and, in the process, reduce the hardware, time, and data cost in building LLMs for these languages. The subsequent supervised finetuning (SFT) was conducted on our internal SFT dataset consisting of 2 million samples.
12
 
13
- For this release we are only including the SFT weight based on a 50B Vietnamese and 20B English tokens pretrained checkpoint.
14
 
 
 
 
 
 
 
15
 
16
- Model weights:
17
- - VBD-LLaMA2-7B-50b-Chat: a snapshot demonstrating the efficacy of the proposed methodology. This base model is pretrained on 50B Vietnamese tokens and 20B English tokens and SFT on XXXX samples.
18
 
19
  <blockquote style="color:red"> <p><strong style="color: red">Terms of Use and License</strong>: By using our released weights, you agree to and comply with the terms and conditions specified in Meta's LLaMA-2 license.</blockquote>
20
 
21
 
22
 
23
- Disclaimer: Despite our efforts in limiting misleading, inaccurate and harmful generation, our released model will come with potential risks. We strongly advise to only use this model under highly supervised environment and/or perform extra testing, red teaming and aligning. The use of this model must abide by and comply with local governance and regulations. The authors of this model shall not be held liable for any claim, damage or other liability arise from the use of the released weight(s).
24
 
25
 
 
26
 
27
- In the following section, we document some of the benchmark of the released weight(s).
28
 
29
- Evaluation
30
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
- We evaluated our model via peer comparison on multiple publicly available dataset using
33
- <a href="https://github.com/hieunguyen1053/lm-evaluation-harness"> @hieunguyen1053 fork of lm-evaluation-harness </a>. The models are benchmark on different task and metrics. The results are below:
34
 
 
35
 
36
- | Organization | Model | Model size | ARC (ACC) | HellaSwag (ACC) | LAMBADA (perplexity) | MMLU (ACC) | IWSLT 2023 en-vi (BLEU) | TruthfulQA (ACC) | Grade 12 Exams (ACC) | hhh_alignment_vi (ACC) | xnli_vi (ACC) |
37
- | ------------ | ---------------------------- | ---------- | --------- | --------------- | -------------------- | ---------- | ----------------------- | ---------------- | -------------------- | ---------------------- | ------------- |
38
- | VietAI | gpt-j-6B-vietnamese-news | ~7B | 0,2419 | 0,3856 | 35,1863 | 0,2282 | 0,6698 | 0,4718 | | | 0,4365 |
39
- | VietAI | gpt-neo-1.3B-vietnamese-news | ~1.5B | 0,2274 | 0,3567 | 64,3972 | 0,229 | 0,5178 | 0,4423 | | | 0,4483 |
40
- | VietGPT | dama-2-7B-chat | ~7B | 0,3417 | 0,5106 | 38,0188 | 0,338 | 24,3101 | 0,4847 | | | 0,4653 |
41
- | VietGPT | dama-2-7B | ~7B | 0,3214 | 0,4892 | 17,6625 | 0,2339 | 25,8764 | 0,4416 | 0,293 | | 0,4469 |
42
- | ViLM | vietcuna-7b-v3 | ~7B | 0,335 | 0,4914 | 21,7747 | 0,336 | 21,0801 | 0,4771 | 0,2992 | | 0,4749 |
43
- | VLSP | hoa-1b4 | ~1.5B | 0,2718 | 0,4228 | 20,3997 | 0,2281 | 28,0573 | 0,4423 | 0,2684 | | 0,4605 |
44
- | VLSP | hoa-7b | ~7B | 0,2855 | 0,4329 | 22,6466 | 0,2536 | 25,5126 | 0,4542 | 0,2705 | | 0,4509 |
45
- | VBD | VBD-LLaMA2-7B-50b | ~7B | 0,3222 | 0,5195 | 13,033 | 0,2964 | | 0,4614 | 0,3197 | | 0,4764 |
46
- | VBD | VBD-LLaMA2-7B-50b-Chat | ~7B | 0,3585 | 0,5207 | 13,419 | 0,3444 | 24,1 | 0,5179 | 0,3299 | 0,5792 | 0,4772 |
47
- | AISingapore | Sealion7b | ~7B | 0,2692 | 0,483 | 16,4388 | 0,267 | | 0,4275 | 0,2725 | | 0,4277 |
48
- | BK Lab | LLaMa-2-BK | ~7B | 0,2966 | 0,4402 | 25,613 | 0,3402 | | 0,4528 | 0,2971 | | 0,4655 |
49
- | Meta | LLaMa-2 | ~7B | 0,3034 | 0,4287 | | 0,3067 | | | | | |
50
- | BigScience | Bloom | ~7B | 0,337 | 0,483 | | 0,281 | | | | | |
51
- | FPT | FPT GenAI | | 0,3581 | 0,5055 | | 0,3143 | | | | | |
52
- | VinAI | PhoGPT SFT | ~7B | 0,2684 | 0,4109 | 55,509 | 0,2499 | | 0,478 | 0,2643 | | 0,4198 |
53
 
54
 
 
55
 
 
56
 
 
 
 
57
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
 
59
  | Organization | Model | Model size | ARC (ACC) | HellaSwag (ACC) | LAMBADA (perplexity) | MMLU (ACC) |
60
  | ------------ | ------------------ | ---------- | --------- | --------------- | -------------------- | ---------- |
@@ -67,39 +98,133 @@ We evaluated our model via peer comparison on multiple publicly available datase
67
  | Meta | LLaMA-2-Chat | ~7B | 0,442 | 0,7547 | 3,968 | 0,4832 |
68
  | AISingapore | Sealion7b | ~7B | 0,3422 | 0,6705 | 6,715 | 0,268 |
69
  | VBD | VBD-LLaMA2-7B-50b-Chat | ~7B | 0,4556 | 0,7384 | 4,645 | 0,4558 |
 
70
 
 
71
 
72
- Based on this results, our model performs on-par or better most models for tasks in Vietnamese. TO_BE_FILLED
73
 
74
- Safety Enchancement in Local Context
75
- TO_BE_FILLED
76
 
 
 
 
 
 
 
 
 
 
77
 
 
 
 
 
 
 
 
78
 
79
- Training process
80
- TO_BE_FILLED
81
 
82
- The next section will during our SSL process
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83
 
84
- The SSL dataset distribution is as follow:
85
 
86
- The training time for this 7B model is around 8,000 GPU hours (roughly 42 days on GPU DGX 8 A100 40GB). The snapshot for the 50B checkpoint is taken around 13,000 steps.
87
 
88
- <p align="left"> <img src="loss.png" width="500" /> </p>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
89
 
90
- Pre-training Strategies
91
- TO_BE_FILLED
 
 
 
 
92
 
 
 
93
 
94
- Supervised fine-tuning (SFT) Data
95
- TO_BE_FILLED
96
 
97
- SFT Strategies
98
- TO_BE_FILLED
99
 
100
- Acknowledgement to Our Linguists
101
- We would like to express our special thanks to our professional and native linguists, who helped build, evaluate, and fact-check our sampled pretraining and SFT dataset as well as evaluating our models across different aspects, especially safety.
 
 
 
 
 
 
 
 
 
 
 
 
102
 
103
- Citation
104
- If you find our project useful, we hope you would kindly star our repo and cite our work as follows: Corresponding Author: v.quangph3@vinbigdata.com, v.kietbs@vinbigdata.com, v.minhtt32@vinbigdata.com
105
 
 
 
 
 
 
6
  ---
7
 
8
  <p align="center"> <img src="vbd_logo.png" width="600" /> </p>
9
+ <h1>VBD-LLaMA2-Chat - a Conversationally-tuned LLaMA2 for Vietnamese</h1>
10
 
11
+ (*Disclaimer 1: VBD-LLaMA family is an effort by VinBigData to support and promote research on LLM in Vietnam. This model is not related to the ViGPT/ViViChat or any other product operating at VinBigData*)
12
 
13
+ We release VBD-LLaMA2-7B-Chat, a finetuned model based on Meta's LLaMA2-7B specifically for the Vietnamese πŸ‡»πŸ‡³ language. This is part of our effort to support the community in building Vietnamese Large Language Models (LLMs). The pretrained weight for this model was trained through continuous self-supervised learning (SSL) by extending LLaMA2's vocab on a corpus consisting of 100 billion Vietnamese πŸ‡»πŸ‡³ tokens and 40 billion English πŸ‡¬πŸ‡§ tokens. This approach attempts to leverage the full potential of existing language models and adapt them to lower resource languages, thereby reduce the hardware, time, and data cost associated building LLMs for these languages. Subsequent supervised finetuning (SFT) was conducted using our internal SFT dataset, which consists of 2 million Vietnamese samples.
14
 
15
+ For this release, we are only including the pretrained weight and the SFT weight of our model's checkpoint, which was trained on 40b Vietnamese and 16b English tokens (56b tokens total).
16
+
17
+ <h3>Model weights:</h3>
18
+
19
+ - VBD-LLaMA2-7B-50b: the snapshot of the pretrained model after 40b Vietnamese tokens and 16b Enlgish tokens ((~50b tokens total))
20
+ - VBD-LLaMA2-7B-50b-Chat: a snapshot demonstrating the efficacy of the proposed methodology. This base model is pretrained on 40b Vietnamese tokens and 16b English tokens and SFT on 2 million samples.
21
 
 
 
22
 
23
  <blockquote style="color:red"> <p><strong style="color: red">Terms of Use and License</strong>: By using our released weights, you agree to and comply with the terms and conditions specified in Meta's LLaMA-2 license.</blockquote>
24
 
25
 
26
 
27
+ **Disclaimer 2: While we have made considerable efforts to minimize misleading, inaccurate, and harmful content generation, it's important to acknowledge that our released model carries inherent risks. We strongly recommend utilizing this model exclusively within a closely supervised environment and/or conducting additional testing, red teaming, and alignment procedures. The utilization of this model must adhere to and comply with local governance and regulations. The authors of this model shall not be held liable for any claims, damages, or other liabilities arising from the use of the released weights..**
28
 
29
 
30
+ <h3>Pre-training Proposal</h3>
31
 
32
+ We propose to do continued pretraining of the 3/7/13 billion parameters large language models (LLaMA, Bloom, MPT, Falcon, etc) for the Vietnamese and English languages.
33
 
 
34
 
35
+ Our proposal involves conducting experiments to enhance the conversational capabilities of this model in Vietnamese while retaining its abilities in English. This will be achieved by transferring knowledge from the English latent space to the Vietnamese latent space.
36
+
37
+
38
+ Instead of training a Vietnamese LLM from scratch, we want to leverage the full potential of existing language models (in English) and transform it into Vietnamese. We aim to reduce hardware costs, time, and data in building language models for Vietnamese.
39
+
40
+
41
+ We intend to augment the original latent space of LLaMA/Bloom LLM by incorporating a Vietnamese latent space. We will then transfer knowledge between these two spaces and fine-tune self-supervised learning (SSL) using both English and Vietnamese unsupervised corpora.
42
+
43
+
44
+ With this model, we expect to make a significant contribution to the development of large language models in Vietnam, making it easier for Vietnamese people to access larger language models in-house. It will create a recipe for other low-resource languages to follow as well.
45
+
46
+
47
+ **Vietnamese language, methods, and research objectives**
48
+
49
+
50
+ We experiment adding the Vietnamese language into large language models that do not originally support Vietnamese. Our hypothesis is that is is feasible to transfer knowledge transfer between different languages utilizing the cross-lingual capabilities of large models to quickly develop a Vietnamese Language Model (LLM) with less training time, data, and computational resources.
51
 
 
 
52
 
53
+ **Our proposed methods:**
54
 
55
+ 1. We will start with a English/multilingual large language model:
56
+ + https://huggingface.co/meta-llama/Llama-2-7b-hf
57
+ 2. We will rebuild the BPE-based tokenizers by preserving the original tokens and incorporating Vietnamese syllables.
58
+ 3. We will transfer knowledge in the latent space by fine-tuning the `added latent space while freezing the original latent space. This step is conducted by using the En-Vi and Vi-En translation tasks.
59
+ 4. Using the new latent space (original latent space + added latent space), we will fine-tune self-supervised learning (SSL) using 40B English tokens and 100B Vietnamese tokens of unsupervised corpora. (the number of tokens as the recent well-performing LLaMA models - of around 1-1.5T tokens.)
60
+ + In this step, we use a special strategy called hybrid training. This allows the model to have better zero-shot/few-shot capabilities even if the model has not been SFT trained. This also enhance the model's capability to understand prompts with limited SFT.
61
+ 5. The training time for the 3B model is roughly 8k GPU hours (roughly 44 days on GPU DGX 8 A100s 40GB), and 16k GPU hours for the 7B model (roughly 84 days on GPU DGX 8 A100s 40GB).
62
+ 6. We will evaluate the model periodically to observe improvents and/or the possibility of early completion of the training progress.
 
 
 
 
 
 
 
 
 
63
 
64
 
65
+ <h3>Self-supervised Fine-Tuning (SFT)</h3>
66
 
67
+ We believe that Conversational-AI will be a significant interface for human-machine interaction in the next few years. Therefore, VBD-LLaMA2-7B-50b-Chat is finetuned on 2 million conversational data, in hopes that there will be more applications of LLMs in conversational systems in the near future.
68
 
69
+ In the following section, we document some of the benchmark of the released weight(s).
70
+
71
+ <h3>Evaluation</h3>
72
 
73
+ We evaluated our model via peer comparison on multiple publicly available dataset using
74
+ <a href="https://github.com/hieunguyen1053/lm-evaluation-harness"> @hieunguyen1053 fork of lm-evaluation-harness </a>
75
+ , and combine the results with that provided by the authors of VinaLLaMA. The results are bellow:
76
+ | Model | Model size | arc_vi (acc) | hellaswag_vi (acc) | mmlu_vi (acc) | truthfulqa_vi (acc) | Average |
77
+ | ---------------------- | ---------- | ------------ | ------------------ | ------------- | ------------------- | ------- |
78
+ | URA-LLaMA-13B | | 0,3752 | 0,4830 | 0,3973 | 0,4574 | 0,4282 |
79
+ | BLOOMZ-7B | | 0,3205 | 0,4930 | 0,3975 | 0,4523 | 0,4158 |
80
+ | PhoGPT-7B5-Instruct | | 0,2470 | 0,2578 | 0,2413 | 0,4759 | 0,3055 |
81
+ | SeaLLM-7B-chat | | 0,3607 | 0,5112 | 0,3339 | 0,4948 | 0,4252 |
82
+ | Vietcuna-7b-v3 | | 0,3419 | 0,4939 | 0,3354 | 0,4807 | 0,4130 |
83
+ | VinaLLaMA-2.7B-chat | | 0,3273 | 0,4814 | 0,3051 | 0,4972 | 0,4028 |
84
+ | VinaLLaMA-7B-chat | | 0,4239 | 0,5407 | 0,3932 | 0,5251 | 0,4707 |
85
+ | VBD-LLaMA2-7B-50b | | 0,3222 | 0,5195 | 0,2964 | 0,4614 | 0,3999 |
86
+ | VBD-LLaMA2-7B-50b-Chat | | 0,3585 | 0,5207 | 0,3444 | 0,5179 | 0,4354 |
87
+
88
+ <p align="center"> Table 1. Benchmark on Vietnamese datasets </p>
89
 
90
  | Organization | Model | Model size | ARC (ACC) | HellaSwag (ACC) | LAMBADA (perplexity) | MMLU (ACC) |
91
  | ------------ | ------------------ | ---------- | --------- | --------------- | -------------------- | ---------- |
 
98
  | Meta | LLaMA-2-Chat | ~7B | 0,442 | 0,7547 | 3,968 | 0,4832 |
99
  | AISingapore | Sealion7b | ~7B | 0,3422 | 0,6705 | 6,715 | 0,268 |
100
  | VBD | VBD-LLaMA2-7B-50b-Chat | ~7B | 0,4556 | 0,7384 | 4,645 | 0,4558 |
101
+ <p align="center"> Table 2. Benchmark on English datasets </p>
102
 
103
+ Based on this results, our model performs on-par or better than most models for tasks in Vietnamese and demonstrate that this approach is extremely potential.
104
 
105
+ Pretraining loss:
106
 
107
+ <p align="left"> <img src="loss.png" width="500" /> </p>
 
108
 
109
+ <h3> Run the model </h3>
110
+
111
+ <h4> with Huggingface's transformers </h4>
112
+
113
+ ```python
114
+ import torch
115
+ from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
116
+
117
+ model_path = "LR-AI-Labs/vbd-llama2-7B-50b-chat"
118
 
119
+ tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
120
+ model = AutoModelForCausalLM.from_pretrained(
121
+ model_path, torch_dtype=torch.bfloat16,
122
+ device_map='auto',
123
+ # load_in_8bit=True
124
+ )
125
+ model.eval()
126
 
127
+ SYS_PROMPT = "A chat between a curious user and an artificial intelligence assistant. "\
128
+ "The assistant gives helpful, detailed, and polite answers to the user's questions."
129
 
130
+ def response_generate(input_prompt):
131
+ input_ids = tokenizer(input_prompt, return_tensors="pt")
132
+ outputs = model.generate(
133
+ inputs=input_ids["input_ids"].to("cuda"),
134
+ attention_mask=input_ids["attention_mask"].to("cuda"),
135
+ do_sample=True,
136
+ temperature=0.7,
137
+ top_k=50,
138
+ top_p=0.9,
139
+ max_new_tokens=1024,
140
+ eos_token_id=tokenizer.eos_token_id,
141
+ pad_token_id=tokenizer.pad_token_id
142
+ )
143
+ response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
144
+ response = response.split("ASSISTANT:")[-1].strip()
145
+ return response
146
 
147
+ print(response_generate(f"{SYS_PROMPT} USER: Xin chΓ o, bαΊ‘n lΓ  ai? ASSISTANT:"))
148
 
149
+ # Xin chΓ o, ViVi lΓ  mα»™t trợ lΓ½ trΓ­ tuệ nhΓ’n tαΊ‘o cΓ³ thể trαΊ£ lời cΓ’u hỏi của bαΊ‘n vΓ  trΓ² chuyện vα»›i bαΊ‘n.
150
 
151
+ ```
152
+
153
+ <h5>For single-turn chat:</h5>
154
+
155
+ ```python
156
+ print(response_generate(f"{SYS_PROMPT} USER: CΓ‘ch để nαΊ₯u mΓ³n phở ngon ASSISTANT:"))
157
+ ```
158
+
159
+ ```python
160
+ print(response_generate(f"{SYS_PROMPT} USER: Viết cho tôi một email xin sếp tăng lưƑng ASSISTANT:"))
161
+ ```
162
+
163
+ ```python
164
+ print(response_generate(f'''{SYS_PROMPT} USER: TrαΊ£ lời cΓ’u hỏi dα»±a vΓ o thΓ΄ng tin lαΊ₯y tα»« cΓ‘c Δ‘oαΊ‘n vΔƒn sau. NαΊΏu khΓ΄ng cΓ³ thΓ΄ng tin trαΊ£ lời "ThαΊ­t tiαΊΏc, cΓ’u hỏi của bαΊ‘n Vivi khΓ΄ng biαΊΏt." NαΊΏu cΓ³ cΓ’u trαΊ£ lời, yΓͺu cαΊ§u trαΊ£ ra cαΊ£ nguα»“n chα»©a cΓ’u trαΊ£ lời dαΊ‘ng "[Δ‘oαΊ‘n k]."
165
+ CÒu hỏi: {question}
166
+ Đoẑn 0: {chunk_0}
167
+ Đoẑn 1: {chunk_1}
168
+ Đoẑn 2: {chunk_2}
169
+ Đoẑn 3: {chunk_3}
170
+ Đoẑn 4: {chunk_4} ASSISTANT:'''))
171
+ ```
172
+
173
+ <h5>For conversation chat:</h5>
174
+
175
+ ```python
176
+ turns = []
177
+
178
+ while True:
179
+ user_input = input("USER: ")
180
+ turns.append("USER: " + user_input)
181
+ bot_output = "ASSISTANT: " + response_generate("%s %s ASSISTANT:" %(SYS_PROMPT, " ".join(turns)))
182
+ print(bot_output)
183
+ turns.append(bot_output)
184
+
185
+ # USER: Xin chΓ o
186
+ # ASSISTANT: Xin chΓ o. LΓ m thαΊΏ nΓ o ViVi cΓ³ thể giΓΊp bαΊ‘n?
187
+ # USER: Anh bα»‹ Δ‘au Δ‘αΊ§u quΓ‘
188
+ # ASSISTANT: BαΊ‘n cΓ³ thể nΓ³i vα»›i ViVi nhiều hΖ‘n về cΖ‘n Δ‘au Δ‘αΊ§u của bαΊ‘n?
189
+ # USER: anh bα»‹ Δ‘au nα»­a Δ‘αΊ§u về Δ‘Γͺm, sΓ‘ng dαΊ­y khΓ΄ng tỉnh tΓ‘o
190
+ # ASSISTANT: Điều quan trọng lΓ  bαΊ‘n nhαΊ­n thα»©c được rαΊ±ng cΖ‘n Δ‘au Δ‘αΊ§u của bαΊ‘n khΓ΄ng phαΊ£i lΓ  Δ‘iều hiαΊΏm gαΊ·p vΓ  nΓ³ cΓ³ thể cΓ³ nhiều nguyΓͺn nhΓ’n. Mα»™t sα»‘ nguyΓͺn nhΓ’n phα»• biαΊΏn của cΖ‘n Δ‘au Δ‘αΊ§u lΓ  cΔƒng thαΊ³ng, thiαΊΏu ngủ, cΔƒng thαΊ³ng, chαΊΏ Δ‘α»™ Δƒn uα»‘ng kΓ©m vΓ  mαΊ₯t nΖ°α»›c.
191
+
192
+ # NαΊΏu cΖ‘n Δ‘au Δ‘αΊ§u của bαΊ‘n khΓ΄ng giαΊ£m Δ‘i sau khi nghỉ ngΖ‘i vΓ  nghỉ ngΖ‘i, Δ‘iều quan trọng lΓ  bαΊ‘n nΓ³i chuyện vα»›i bΓ‘c sΔ© của bαΊ‘n. BΓ‘c sΔ© của bαΊ‘n cΓ³ thể giΓΊp bαΊ‘n xΓ‘c Δ‘α»‹nh nguyΓͺn nhΓ’n của cΖ‘n Δ‘au Δ‘αΊ§u của bαΊ‘n vΓ  đề xuαΊ₯t cΓ‘c phΖ°Ζ‘ng phΓ‘p Δ‘iều trα»‹ thΓ­ch hợp.
193
+ # USER: cΓ³ cΓ‘ch nΓ o giαΊ£m Δ‘au tαΊ‘i nhΓ  khΓ΄ng?
194
+ # ASSISTANT: CΓ³, cΓ³ mα»™t sα»‘ cΓ‘ch bαΊ‘n cΓ³ thể thα»­ giαΊ£m Δ‘au Δ‘αΊ§u tαΊ‘i nhΓ :
195
 
196
+ # - Uα»‘ng nhiều nΖ°α»›c để giαΊ£m Δ‘au Δ‘αΊ§u do mαΊ₯t nΖ°α»›c.
197
+ # - Nghỉ ngΖ‘i vΓ  ngủ nhiều hΖ‘n.
198
+ # - Áp dα»₯ng bΔƒng lαΊ‘nh hoαΊ·c nΓ³ng vΓ o khu vα»±c Δ‘au.
199
+ # - Áp dα»₯ng thuα»‘c giαΊ£m Δ‘au khΓ΄ng kΓͺ Δ‘Ζ‘n nhΖ° ibuprofen hoαΊ·c paracetamol.
200
+ # - Thα»±c hiện cΓ‘c kα»Ή thuαΊ­t thΖ° giΓ£n nhΖ° thiền, thở sΓ’u hoαΊ·c yoga.
201
+ # - Massage khu vα»±c bα»‹ αΊ£nh hưởng.
202
 
203
+ # Điều quan trọng cαΊ§n nhα»› lΓ  trong khi cΓ‘c biện phΓ‘p khαΊ―c phα»₯c tαΊ‘i nhΓ  cΓ³ thể giΓΊp giαΊ£m Δ‘au Δ‘αΊ§u, chΓΊng khΓ΄ng thay thαΊΏ cho lời khuyΓͺn y tαΊΏ chuyΓͺn nghiệp. NαΊΏu cΖ‘n Δ‘au Δ‘αΊ§u của bαΊ‘n vαΊ«n tα»“n tαΊ‘i hoαΊ·c trở nΓͺn tα»“i tệ hΖ‘n, Δ‘iều quan trọng lΓ  bαΊ‘n nΓ³i chuyện vα»›i bΓ‘c sΔ© của bαΊ‘n.
204
+ ```
205
 
206
+ ***Modify the parameters "temperature", "top_k", "top_p" to suit your usecase.***
 
207
 
208
+ <h3>Limitations and Future Research</h3>
 
209
 
210
+ The published model has certain limitations. For example, it performs poorly on tasks involving reasoning, coding or mathematics. In addition, the model will occasionally produce harmful, biased responses, or answer unsafe questions. Users should be cautious while interacting with VBD-LLaMA2-7B-50b-Chat and verify important information taken from the model's outputs because such infomation can be factually incorrect.
211
+
212
+ This model has been trained on and exhibits decent capability to tackle Vietnamese tasks, especially those associated with conversations. However, the model still struggles with questions related to Vietnamese history, culture, and society. We recommend some approaches to further improve this model:
213
+
214
+ + Data Distillation: Construct a small dataset of local/in-domain knowledge to continuously train the model. You might find great ideas searching through the topic of domain adaptation too ;)
215
+ + Merging/Combining/Ensembling Models: There have been numerous models developed based on Meta's LLaMA, so another approach might be to a training process similar to knowledge distilation, where the teacher consists of combinations of previously trained models.
216
+ + RLHF/Alignment: The model has not been trained with RFHF or alignment techniques such as DPO.
217
+ + Retrieval Augmented Generation (RAG): Combine the model with external knowledge sources.
218
+
219
+ <h3>Acknowledgements:</h3>
220
+
221
+ We would like to express our gratitude towards the Virtual Assistant Technology Center at VinBigData JSC. led by Dr. <a href="https://scholar.google.com.vn/citations?user=z3IDeu0AAAAJ&hl=vi"> Kim Anh Nguyen </a> for providing us with the necessary resources to deliver this project. We are also greatly indebted to our fellow colleagues at the Natural Language Processing Department at VinBigData, whose feedbacks and expertise had been of great help.
222
+
223
+ <h3>Citation</h3>
224
 
225
+ If you find our project useful, we hope you would kindly star our repo and cite our work as follows:
 
226
 
227
+ Corresponding Author:
228
+ + v.quangph3@vinbigdata.com ([QuangPH](https://samsonph.github.io/))
229
+ + v.kietbs@vinbigdata.com ([KietBS](https://github.com/ntdas/))
230
+ + v.minhtt32@vinbigdata.com ([MinhTT](https://github.com/tanminhtran168/))
added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "<pad>": 49380
3
+ }
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlamaForCausalLM"
4
+ ],
5
+ "bos_token_id": 1,
6
+ "eos_token_id": 2,
7
+ "hidden_act": "silu",
8
+ "hidden_size": 4096,
9
+ "initializer_range": 0.02,
10
+ "intermediate_size": 11008,
11
+ "max_position_embeddings": 4096,
12
+ "model_type": "llama",
13
+ "num_attention_heads": 32,
14
+ "num_hidden_layers": 32,
15
+ "num_key_value_heads": 32,
16
+ "pad_token_id": 0,
17
+ "pretraining_tp": 1,
18
+ "rms_norm_eps": 1e-05,
19
+ "rope_scaling": null,
20
+ "tie_word_embeddings": false,
21
+ "torch_dtype": "bfloat16",
22
+ "transformers_version": "4.28.1",
23
+ "use_cache": false,
24
+ "vocab_size": 49381
25
+ }
generation_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "pad_token_id": 49380,
6
+ "temperature": 0.9,
7
+ "top_p": 0.6,
8
+ "transformers_version": "4.28.1"
9
+ }
model-00001-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:26e3e9c053a15d8edca50534cd4937c8bede6f6aa5702ea6a3404beeb97f07b2
3
+ size 4991192856
model-00002-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4c83c378688e7223cad5f86c43480fa5ada5ce8cfab68fbd8f5a74456ffdc763
3
+ size 4947390888
model-00003-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:df6bff780b430de99d1bdbdc76e7f2b5d0d0fa0d0f8efc63cead51af51dcc148
3
+ size 3823051624
model.safetensors.index.json ADDED
@@ -0,0 +1,298 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 13761601536
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "model-00003-of-00003.safetensors",
7
+ "model.embed_tokens.weight": "model-00001-of-00003.safetensors",
8
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00003.safetensors",
9
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
10
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
11
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
12
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
13
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
14
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
15
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
16
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
17
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00003.safetensors",
18
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
19
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
20
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
21
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
22
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
23
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
24
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
25
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
26
+ "model.layers.10.input_layernorm.weight": "model-00001-of-00003.safetensors",
27
+ "model.layers.10.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
28
+ "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
29
+ "model.layers.10.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
30
+ "model.layers.10.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
31
+ "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
32
+ "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
33
+ "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
34
+ "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
35
+ "model.layers.11.input_layernorm.weight": "model-00002-of-00003.safetensors",
36
+ "model.layers.11.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
37
+ "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
38
+ "model.layers.11.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
39
+ "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
40
+ "model.layers.11.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
41
+ "model.layers.11.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
42
+ "model.layers.11.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
43
+ "model.layers.11.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
44
+ "model.layers.12.input_layernorm.weight": "model-00002-of-00003.safetensors",
45
+ "model.layers.12.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
46
+ "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
47
+ "model.layers.12.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
48
+ "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
49
+ "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
50
+ "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
51
+ "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
52
+ "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
53
+ "model.layers.13.input_layernorm.weight": "model-00002-of-00003.safetensors",
54
+ "model.layers.13.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
55
+ "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
56
+ "model.layers.13.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
57
+ "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
58
+ "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
59
+ "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
60
+ "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
61
+ "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
62
+ "model.layers.14.input_layernorm.weight": "model-00002-of-00003.safetensors",
63
+ "model.layers.14.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
64
+ "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
65
+ "model.layers.14.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
66
+ "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
67
+ "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
68
+ "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
69
+ "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
70
+ "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
71
+ "model.layers.15.input_layernorm.weight": "model-00002-of-00003.safetensors",
72
+ "model.layers.15.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
73
+ "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
74
+ "model.layers.15.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
75
+ "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
76
+ "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
77
+ "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
78
+ "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
79
+ "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
80
+ "model.layers.16.input_layernorm.weight": "model-00002-of-00003.safetensors",
81
+ "model.layers.16.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
82
+ "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
83
+ "model.layers.16.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
84
+ "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
85
+ "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
86
+ "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
87
+ "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
88
+ "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
89
+ "model.layers.17.input_layernorm.weight": "model-00002-of-00003.safetensors",
90
+ "model.layers.17.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
91
+ "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
92
+ "model.layers.17.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
93
+ "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
94
+ "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
95
+ "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
96
+ "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
97
+ "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
98
+ "model.layers.18.input_layernorm.weight": "model-00002-of-00003.safetensors",
99
+ "model.layers.18.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
100
+ "model.layers.18.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
101
+ "model.layers.18.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
102
+ "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
103
+ "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
104
+ "model.layers.18.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
105
+ "model.layers.18.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
106
+ "model.layers.18.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
107
+ "model.layers.19.input_layernorm.weight": "model-00002-of-00003.safetensors",
108
+ "model.layers.19.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
109
+ "model.layers.19.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
110
+ "model.layers.19.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
111
+ "model.layers.19.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
112
+ "model.layers.19.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
113
+ "model.layers.19.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
114
+ "model.layers.19.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
115
+ "model.layers.19.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
116
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00003.safetensors",
117
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
118
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
119
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
120
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
121
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
122
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
123
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
124
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
125
+ "model.layers.20.input_layernorm.weight": "model-00002-of-00003.safetensors",
126
+ "model.layers.20.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
127
+ "model.layers.20.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
128
+ "model.layers.20.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
129
+ "model.layers.20.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
130
+ "model.layers.20.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
131
+ "model.layers.20.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
132
+ "model.layers.20.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
133
+ "model.layers.20.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
134
+ "model.layers.21.input_layernorm.weight": "model-00002-of-00003.safetensors",
135
+ "model.layers.21.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
136
+ "model.layers.21.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
137
+ "model.layers.21.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
138
+ "model.layers.21.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
139
+ "model.layers.21.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
140
+ "model.layers.21.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
141
+ "model.layers.21.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
142
+ "model.layers.21.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
143
+ "model.layers.22.input_layernorm.weight": "model-00002-of-00003.safetensors",
144
+ "model.layers.22.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
145
+ "model.layers.22.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
146
+ "model.layers.22.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
147
+ "model.layers.22.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
148
+ "model.layers.22.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
149
+ "model.layers.22.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
150
+ "model.layers.22.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
151
+ "model.layers.22.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
152
+ "model.layers.23.input_layernorm.weight": "model-00003-of-00003.safetensors",
153
+ "model.layers.23.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
154
+ "model.layers.23.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
155
+ "model.layers.23.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
156
+ "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
157
+ "model.layers.23.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
158
+ "model.layers.23.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
159
+ "model.layers.23.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
160
+ "model.layers.23.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
161
+ "model.layers.24.input_layernorm.weight": "model-00003-of-00003.safetensors",
162
+ "model.layers.24.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
163
+ "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
164
+ "model.layers.24.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
165
+ "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
166
+ "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
167
+ "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
168
+ "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
169
+ "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
170
+ "model.layers.25.input_layernorm.weight": "model-00003-of-00003.safetensors",
171
+ "model.layers.25.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
172
+ "model.layers.25.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
173
+ "model.layers.25.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
174
+ "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
175
+ "model.layers.25.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
176
+ "model.layers.25.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
177
+ "model.layers.25.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
178
+ "model.layers.25.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
179
+ "model.layers.26.input_layernorm.weight": "model-00003-of-00003.safetensors",
180
+ "model.layers.26.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
181
+ "model.layers.26.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
182
+ "model.layers.26.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
183
+ "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
184
+ "model.layers.26.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
185
+ "model.layers.26.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
186
+ "model.layers.26.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
187
+ "model.layers.26.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
188
+ "model.layers.27.input_layernorm.weight": "model-00003-of-00003.safetensors",
189
+ "model.layers.27.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
190
+ "model.layers.27.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
191
+ "model.layers.27.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
192
+ "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
193
+ "model.layers.27.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
194
+ "model.layers.27.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
195
+ "model.layers.27.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
196
+ "model.layers.27.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
197
+ "model.layers.28.input_layernorm.weight": "model-00003-of-00003.safetensors",
198
+ "model.layers.28.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
199
+ "model.layers.28.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
200
+ "model.layers.28.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
201
+ "model.layers.28.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
202
+ "model.layers.28.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
203
+ "model.layers.28.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
204
+ "model.layers.28.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
205
+ "model.layers.28.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
206
+ "model.layers.29.input_layernorm.weight": "model-00003-of-00003.safetensors",
207
+ "model.layers.29.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
208
+ "model.layers.29.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
209
+ "model.layers.29.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
210
+ "model.layers.29.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
211
+ "model.layers.29.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
212
+ "model.layers.29.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
213
+ "model.layers.29.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
214
+ "model.layers.29.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
215
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00003.safetensors",
216
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
217
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
218
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
219
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
220
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
221
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
222
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
223
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
224
+ "model.layers.30.input_layernorm.weight": "model-00003-of-00003.safetensors",
225
+ "model.layers.30.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
226
+ "model.layers.30.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
227
+ "model.layers.30.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
228
+ "model.layers.30.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
229
+ "model.layers.30.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
230
+ "model.layers.30.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
231
+ "model.layers.30.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
232
+ "model.layers.30.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
233
+ "model.layers.31.input_layernorm.weight": "model-00003-of-00003.safetensors",
234
+ "model.layers.31.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
235
+ "model.layers.31.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
236
+ "model.layers.31.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
237
+ "model.layers.31.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
238
+ "model.layers.31.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
239
+ "model.layers.31.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
240
+ "model.layers.31.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
241
+ "model.layers.31.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
242
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00003.safetensors",
243
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
244
+ "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
245
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
246
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
247
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
248
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
249
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
250
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
251
+ "model.layers.5.input_layernorm.weight": "model-00001-of-00003.safetensors",
252
+ "model.layers.5.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
253
+ "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
254
+ "model.layers.5.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
255
+ "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
256
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
257
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
258
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
259
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
260
+ "model.layers.6.input_layernorm.weight": "model-00001-of-00003.safetensors",
261
+ "model.layers.6.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
262
+ "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
263
+ "model.layers.6.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
264
+ "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
265
+ "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
266
+ "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
267
+ "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
268
+ "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
269
+ "model.layers.7.input_layernorm.weight": "model-00001-of-00003.safetensors",
270
+ "model.layers.7.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
271
+ "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
272
+ "model.layers.7.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
273
+ "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
274
+ "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
275
+ "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
276
+ "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
277
+ "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
278
+ "model.layers.8.input_layernorm.weight": "model-00001-of-00003.safetensors",
279
+ "model.layers.8.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
280
+ "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
281
+ "model.layers.8.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
282
+ "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
283
+ "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
284
+ "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
285
+ "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
286
+ "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
287
+ "model.layers.9.input_layernorm.weight": "model-00001-of-00003.safetensors",
288
+ "model.layers.9.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
289
+ "model.layers.9.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
290
+ "model.layers.9.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
291
+ "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
292
+ "model.layers.9.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
293
+ "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
294
+ "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
295
+ "model.layers.9.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
296
+ "model.norm.weight": "model-00003-of-00003.safetensors"
297
+ }
298
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "<unk>",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": true,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:450f5e8ed9b73ec6e9c31822ac667f93451914e8f80a5ef5a3be71916e44c506
3
+ size 794744
tokenizer_config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "bos_token": {
5
+ "__type": "AddedToken",
6
+ "content": "<s>",
7
+ "lstrip": false,
8
+ "normalized": true,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "clean_up_tokenization_spaces": false,
13
+ "eos_token": {
14
+ "__type": "AddedToken",
15
+ "content": "</s>",
16
+ "lstrip": false,
17
+ "normalized": true,
18
+ "rstrip": false,
19
+ "single_word": false
20
+ },
21
+ "legacy": false,
22
+ "model_max_length": 4096,
23
+ "pad_token": null,
24
+ "padding_side": "right",
25
+ "sp_model_kwargs": {},
26
+ "tokenizer_class": "LlamaTokenizer",
27
+ "unk_token": {
28
+ "__type": "AddedToken",
29
+ "content": "<unk>",
30
+ "lstrip": false,
31
+ "normalized": true,
32
+ "rstrip": false,
33
+ "single_word": false
34
+ },
35
+ "use_fast": true
36
+ }