# Vietnamese Llama2-7B 8k Context Length with LoRA Adapters This repository contains a Vietnamese Llama2-7B model fine-tuned with QLoRA (Quantization Low-Rank Adapter) adapters. The adapter is a plug-and-play tool that enables the LLaMa model to perform well in many Vietnamese NLP tasks. Project Github page: [Github](https://github.com/VietnamAIHub/Vietnamese_LLMs) ## Model Overview The Vietnamese Llama2-7B model is a large language model capable of generating meaningful text and can be used in a wide variety of natural language processing tasks, including text generation, sentiment analysis, and more. By using LoRA adapters, the model achieves better performance on low-resource tasks and demonstrates improved generalization. ## Dataset and Fine-Tuning The LLaMa2 model was fine-tuned on over 200K Vietnamese instructions from various sources to improve its ability to understand and generate text for different tasks. The instruction dataset comprises data from the following sources: Dataset link: Comming soon ## Testing the Model by yourself. To load the fine-tuned Llama2-7B model with LoRA adapters, follow the code snippet below: ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model_name = "VietnamAIHub/Vietnamese_llama2_7B_8K_SFT_General_domain" ## Loading Base LLaMa model weight and Merge with Adapter Weight wiht the base model m = AutoModelForCausalLM.from_pretrained( model_name, load_in_8bit=True, torch_dtype=torch.bfloat16, pretraining_tp=1, # use_auth_token=True, # trust_remote_code=True, cache_dir=cache_dir, ) tok = AutoTokenizer.from_pretrained( model_name, cache_dir=cache_dir, padding_side="right", use_fast=False, # Fast tokenizer giving issues. tokenizer_type='llama', #if 'llama' in args.model_name_or_path else None, # Needed for HF name change use_auth_token=True, ) tok.bos_token_id = 1 stop_token_ids = [0] class StopOnTokens(StoppingCriteria): def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool: for stop_id in stop_token_ids: if input_ids[0][-1] == stop_id: return True return False generation_config = dict( temperature=0.2, top_k=20, top_p=0.9, do_sample=True, num_beams=1, repetition_penalty=1.2, max_new_tokens=400, early_stopping=True, ) prompts_input="Cách để học tập về một môn học thật tốt" system_prompt=f"[INST] <>\n You are a helpful assistant, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your \ answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure\ that your responses are socially unbiased and positive in nature.\ If a question does not make any sense, or is not factually coherent, explain why instead of answering something not \ correct. If you don't know the answer to a question, please response as language model you are not able to respone detailed to these kind of question.\n<>\n\n {prompts_input} [/INST] " input_ids = tok(message, return_tensors="pt").input_ids input_ids = input_ids.to(m.device) stop = StopOnTokens() streamer = TextIteratorStreamer(tok, timeout=10.0, skip_prompt=True, skip_special_tokens=True) # #print(tok.decode(output[0])) generation_config = dict( temperature=0.1, top_k=30, top_p=0.95, do_sample=True, # num_beams=1, repetition_penalty=1.2, max_new_tokens=2048, ## 8K early_stopping=True, stopping_criteria=StoppingCriteriaList([stop]), ) inputs = tok(message,return_tensors="pt") #add_special_tokens=False ? generation_output = m.generate( input_ids = inputs["input_ids"].to(device), attention_mask = inputs['attention_mask'].to(device), eos_token_id=tok.eos_token_id, pad_token_id=tok.pad_token_id, **generation_config ) generation_output_ = m.generate(input_ids = inputs["input_ids"].to(device), **generation_config) s = generation_output[0] output = tok.decode(s,skip_special_tokens=True) #response = output.split("### Output:")[1].strip() print(output) ``` ## Conclusion The Vietnamese Llama2-7B with LoRA adapters is a versatile language model that can be utilized for a wide range of NLP tasks in Vietnamese. We hope that researchers and developers find this model useful and are encouraged to experiment with it in their projects. For any questions, feedback, or contributions, please feel free to contact the maintainers of this repository TranNhiem 🙌: [Linkedin](https://www.linkedin.com/in/tran-nhiem-ab1851125/) [Twitter](https://twitter.com/TranRick2) [Facebook](https://www.facebook.com/jean.tran.336), Project [Discord](https://discord.gg/MC3yDZNz). Happy fine-tuning and experimenting with the Llama2-7B model!