|
# Vietnamese Llama2-13B 8k Context Length with LoRA Adapters |
|
|
|
|
|
This repository contains a Llama-13B model fine-tuned with QLoRA (Quantization Low-Rank Adapter) adapters. The adapter is a plug-and-play tool that enables the LLaMa model to perform well in many Vietnamese NLP tasks. |
|
Project Github page: [Github](https://github.com/VietnamAIHub/Vietnamese_LLMs) |
|
## Model Overview |
|
|
|
The Vietnamese Llama2-13b model is a large language model capable of generating meaningful text and can be used in a wide variety of natural language processing tasks, including text generation, sentiment analysis, and more. By using LoRA adapters, the model achieves better performance on low-resource tasks and demonstrates improved generalization. |
|
|
|
## Dataset and Fine-Tuning |
|
|
|
The LLaMa2 model was fine-tuned on over 200K Vietnamese instructions from various sources to improve its ability to understand and generate text for different tasks. The instruction dataset comprises data from the following sources: |
|
Dataset link: Comming soon |
|
|
|
## Testing the Model by yourself. |
|
|
|
To load the fine-tuned Llama-13B model with LoRA adapters, follow the code snippet below: |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria |
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
|
|
|
|
model_name = "VietnamAIHub/Vietnamese_LLama2_13B_8K_SFT_General_Domain_Knowledge" |
|
|
|
## Loading Base LLaMa model weight and Merge with Adapter Weight wiht the base model |
|
m = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
load_in_8bit=True, |
|
torch_dtype=torch.bfloat16, |
|
pretraining_tp=1, |
|
# use_auth_token=True, |
|
# trust_remote_code=True, |
|
cache_dir=cache_dir, |
|
) |
|
|
|
tok = AutoTokenizer.from_pretrained( |
|
model_name, |
|
cache_dir=cache_dir, |
|
padding_side="right", |
|
use_fast=False, # Fast tokenizer giving issues. |
|
tokenizer_type='llama', #if 'llama' in args.model_name_or_path else None, # Needed for HF name change |
|
use_auth_token=True, |
|
) |
|
tok.bos_token_id = 1 |
|
stop_token_ids = [0] |
|
|
|
class StopOnTokens(StoppingCriteria): |
|
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool: |
|
for stop_id in stop_token_ids: |
|
if input_ids[0][-1] == stop_id: |
|
return True |
|
return False |
|
|
|
generation_config = dict( |
|
temperature=0.2, |
|
top_k=20, |
|
top_p=0.9, |
|
do_sample=True, |
|
num_beams=1, |
|
repetition_penalty=1.2, |
|
max_new_tokens=400, |
|
early_stopping=True, |
|
|
|
) |
|
|
|
prompts_input="Cách để học tập về một môn học thật tốt" |
|
system_prompt=f"<s>[INST] <<SYS>>\n You are a helpful assistant, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your \ |
|
answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure\ |
|
that your responses are socially unbiased and positive in nature.\ |
|
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not \ |
|
correct. If you don't know the answer to a question, please response as language model you are not able to respone detailed to these kind of question.\n<</SYS>>\n\n {prompts_input} [/INST] " |
|
|
|
|
|
|
|
input_ids = tok(message, return_tensors="pt").input_ids |
|
input_ids = input_ids.to(m.device) |
|
stop = StopOnTokens() |
|
streamer = TextIteratorStreamer(tok, timeout=10.0, skip_prompt=True, skip_special_tokens=True) |
|
|
|
# #print(tok.decode(output[0])) |
|
generation_config = dict( |
|
temperature=0.1, |
|
top_k=30, |
|
top_p=0.95, |
|
do_sample=True, |
|
# num_beams=1, |
|
repetition_penalty=1.2, |
|
max_new_tokens=2048, ## 8K |
|
early_stopping=True, |
|
stopping_criteria=StoppingCriteriaList([stop]), |
|
) |
|
inputs = tok(message,return_tensors="pt") #add_special_tokens=False ? |
|
generation_output = m.generate( |
|
input_ids = inputs["input_ids"].to(device), |
|
attention_mask = inputs['attention_mask'].to(device), |
|
eos_token_id=tok.eos_token_id, |
|
pad_token_id=tok.pad_token_id, |
|
**generation_config |
|
) |
|
generation_output_ = m.generate(input_ids = inputs["input_ids"].to(device), **generation_config) |
|
|
|
s = generation_output[0] |
|
output = tok.decode(s,skip_special_tokens=True) |
|
#response = output.split("### Output:")[1].strip() |
|
print(output) |
|
``` |
|
|
|
## Conclusion |
|
The Vietnamese Llama2-13b with LoRA adapters is a versatile language model that can be utilized for a wide range of NLP tasks in Vietnamese. We hope that researchers and developers find this model useful and are encouraged to experiment with it in their projects. |
|
For any questions, feedback, or contributions, please feel free to contact the maintainers of this repository TranNhiem 🙌: [Linkedin](https://www.linkedin.com/in/tran-nhiem-ab1851125/) [Twitter](https://twitter.com/TranRick2) [Facebook](https://www.facebook.com/jean.tran.336), Project [Discord](https://discord.gg/MC3yDZNz). Happy fine-tuning and experimenting with the Llama-30b model! |