VietnamAIHub
/

Vietnamese_llama1_30B_SFT

+# Llama-30b with LoRA Adapters
+[Llama-30b with LoRA Adapters]
+This repository contains a Llama-30b model fine-tuned with QLoRA (Quantization Low-Rank Adapter) adapters. The adapter is a plug-and-play tool that enables the LLaMa model to perform well in many Vietnamese NLP tasks.
+## Model Overview
+The Llama-30b model is a large language model capable of generating meaningful text and can be used in a wide variety of natural language processing tasks, including text generation, sentiment analysis, and more. By using LoRA adapters, the model achieves better performance on low-resource tasks and demonstrates improved generalization.
+## Dataset and Fine-Tuning
+The LLaMa model was fine-tuned on over 200K instructions from various sources to improve its ability to understand and generate text for different tasks. The instruction dataset comprises data from the following sources:
+- Alpaca 52
+- LiMA 1K
+- Dolly 15K
+- VietHealth
+- WikiHow
+- GPT4ALL
+- VietQuAD
+## Loading the Model
+To load the fine-tuned Llama-30b model with LoRA adapters, follow the code snippet below:
+```python
+import torch
+from transformers import AutoModelForCausalLM, LlamaTokenizer
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model_name = "VietnamAIHub/Vietnamese_SFT_llama_30B_v1"
+cache_dir="/save_weight_path"
+## Loading Base LLaMa model weight and Merge with Adapter Weight wiht the base model
+m = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype=torch.bfloat16,
+    device_map={"cuda": 0},
+    cache_dir=cache_dir
+)
+## Save model to specific path
+tok = LlamaTokenizer.from_pretrained(model_name, cache_dir=cache_dir)
+## Loading Unified Model Again after Merging the Weight
+tok.bos_token_id = 1
+generation_config = dict(
+        temperature=0.2,
+        top_k=20,
+        top_p=0.9,
+        do_sample=True,
+        num_beams=1,
+        repetition_penalty=1.2,
+        max_new_tokens=400,
+        early_stopping=True,
+    )
+prompt="Cách để học tập về một môn học thật tốt"
+_DEFAULT_TEMPLATE=f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### prompt:\n{prompt}\n\n### response:\n"
+inputs = tok(message,return_tensors="pt")  #add_special_tokens=False ?
+generation_output = m.generate(
+    input_ids = inputs["input_ids"].to(device),
+    attention_mask = inputs['attention_mask'].to(device),
+    eos_token_id=tok.eos_token_id,
+    pad_token_id=tok.pad_token_id,
+    **generation_config
+)
+generation_output_ = m.generate(input_ids = inputs["input_ids"].to(device), **generation_config)
+s = generation_output[0]
+output = tok.decode(s,skip_special_tokens=True)
+response = output.split("### response:")[1].strip()
+print(respone)
+```
+## Conclusion
+The Llama-30b with LoRA adapters is a versatile language model that can be utilized for a wide range of NLP tasks in Vietnamese. We hope that researchers and developers find this model useful and are encouraged to experiment with it in their projects.
+For any questions, feedback, or contributions, please feel free to contact the maintainers of this repository TranNhiem 🙌: [Linkedin](https://www.linkedin.com/in/tran-nhiem-ab1851125/) [Twitter](https://twitter.com/TranRick2) [Facebook](https://www.facebook.com/jean.tran.336). Happy fine-tuning and experimenting with the Llama-30b model!

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,23 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}