VietnamAIHub
/

Vietnamese_llama2_7B_8K_SFT_General_domain

+# Vietnamese Llama2-7B 8k Context Length with LoRA Adapters
+This repository contains a Llama-7B model fine-tuned with QLoRA (Quantization Low-Rank Adapter) adapters. The adapter is a plug-and-play tool that enables the LLaMa model to perform well in many Vietnamese NLP tasks.
+Project Github page: [Github](https://github.com/VietnamAIHub/Vietnamese_LLMs)
+## Model Overview
+The Vietnamese Llama2-7B model is a large language model capable of generating meaningful text and can be used in a wide variety of natural language processing tasks, including text generation, sentiment analysis, and more. By using LoRA adapters, the model achieves better performance on low-resource tasks and demonstrates improved generalization.
+## Dataset and Fine-Tuning
+The LLaMa2 model was fine-tuned on over 200K Vietnamese instructions from various sources to improve its ability to understand and generate text for different tasks. The instruction dataset comprises data from the following sources:
+Dataset link: Comming soon
+## Testing the Model by yourself.
+To load the fine-tuned Llama-7B model with LoRA adapters, follow the code snippet below:
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model_name = "VietnamAIHub/Vietnamese_llama2_7B_8K_SFT_General_domain"
+## Loading Base LLaMa model weight and Merge with Adapter Weight wiht the base model
+m = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    load_in_8bit=True,
+    torch_dtype=torch.bfloat16,
+    pretraining_tp=1,
+    # use_auth_token=True,
+    # trust_remote_code=True,
+    cache_dir=cache_dir,
+)
+tok = AutoTokenizer.from_pretrained(
+    model_name,
+    cache_dir=cache_dir,
+    padding_side="right",
+    use_fast=False, # Fast tokenizer giving issues.
+    tokenizer_type='llama', #if 'llama' in args.model_name_or_path else None, # Needed for HF name change
+    use_auth_token=True,
+)
+tok.bos_token_id = 1
+stop_token_ids = [0]
+class StopOnTokens(StoppingCriteria):
+    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
+        for stop_id in stop_token_ids:
+            if input_ids[0][-1] == stop_id:
+                return True
+        return False
+generation_config = dict(
+        temperature=0.2,
+        top_k=20,
+        top_p=0.9,
+        do_sample=True,
+        num_beams=1,
+        repetition_penalty=1.2,
+        max_new_tokens=400,
+        early_stopping=True,
+    )
+prompts_input="Cách để học tập về một môn học thật tốt"
+system_prompt=f"<s>[INST] <<SYS>>\n You are a helpful assistant, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your \
+        answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure\
+        that your responses are socially unbiased and positive in nature.\
+        If a question does not make any sense, or is not factually coherent, explain why instead of answering something not \
+        correct. If you don't know the answer to a question, please response as language model you are not able to respone detailed to these kind of question.\n<</SYS>>\n\n {prompts_input} [/INST] "
+input_ids = tok(message, return_tensors="pt").input_ids
+input_ids = input_ids.to(m.device)
+stop = StopOnTokens()
+streamer = TextIteratorStreamer(tok, timeout=10.0, skip_prompt=True, skip_special_tokens=True)
+# #print(tok.decode(output[0]))
+generation_config = dict(
+    temperature=0.1,
+    top_k=30,
+    top_p=0.95,
+    do_sample=True,
+    # num_beams=1,
+    repetition_penalty=1.2,
+    max_new_tokens=2048, ## 8K
+    early_stopping=True,
+    stopping_criteria=StoppingCriteriaList([stop]),
+)
+inputs = tok(message,return_tensors="pt")  #add_special_tokens=False ?
+generation_output = m.generate(
+    input_ids = inputs["input_ids"].to(device),
+    attention_mask = inputs['attention_mask'].to(device),
+    eos_token_id=tok.eos_token_id,
+    pad_token_id=tok.pad_token_id,
+    **generation_config
+)
+generation_output_ = m.generate(input_ids = inputs["input_ids"].to(device), **generation_config)
+s = generation_output[0]
+output = tok.decode(s,skip_special_tokens=True)
+#response = output.split("### Output:")[1].strip()
+print(output)
+```
+## Conclusion
+The Vietnamese Llama2-7B with LoRA adapters is a versatile language model that can be utilized for a wide range of NLP tasks in Vietnamese. We hope that researchers and developers find this model useful and are encouraged to experiment with it in their projects.
+For any questions, feedback, or contributions, please feel free to contact the maintainers of this repository TranNhiem 🙌: [Linkedin](https://www.linkedin.com/in/tran-nhiem-ab1851125/) [Twitter](https://twitter.com/TranRick2) [Facebook](https://www.facebook.com/jean.tran.336), Project [Discord](https://discord.gg/MC3yDZNz). Happy fine-tuning and experimenting with the Llama-30b model!