--- license: apache-2.0 language: - vi - en ---

Image

## Model Details - **Developed by:** Tuan Pham (FPTU HCM Student) - **Model type:** Llama2-7B Decoder-only - **Finetuned from model :** * meta-llama/Llama-2-7b * bkai-foundation-models/vietnamese-llama2-7b-120GB * yeen214/llama2_7b_merge_orcafamily. - **Bilingual support :** English and Vietnamese ### Model Description This model is a proof of effort that one man can fine-tune his own model to reach SOTA. ### Model Sources - **Repository:** * Training: https://github.com/vTuanpham/Vietnamese_QA_System * Data: https://github.com/vTuanpham/Large_dataset_translator - **Paper:** ... - **Demo:** ... ## Uses ### Prompt template ``` [SYSTEM_PROMPT] ####### Instruction: [INPUT] %%%%%%% Response: [RESPONSE] ``` Recommend keeping the system prompt in english. ## How to Get Started with the Model Use the code below to get started with the model. ```python from torch.cuda.amp import autocast from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer, pipeline model_name = "1TuanPham/T-Llama" model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, use_cache=True, ) tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True) streamer = TextStreamer(tokenizer, skip_special_tokens=True) pipe = pipeline("text-generation", model=base_model, tokenizer=tokenizer, streamer=streamer) with autocast(): output_default = pipe("Phạm Nhật Vượng là ", pad_token_id=50256, max_new_tokens=128) ``` ## Training Details **Hardware Type:** * GPU: VGA NVIDIA Tesla P100 16GB * SYSTEM RAM: 29GB **Hours used:** ~47.5 days Approx* ### Training Data * BactrianX * OpenOrca_translated * WizardLM_70k_translated * TigerLabMathInstruct_translated_vi * GradeSchoolMathInstruct_translated * vilm_lima-vi * MTEngVietnamese * databricks_dolly15k_translated * AlpacaCleaned_translated * databricks_dolly15k * OpenOrca * GradeSchoolMathInstruct * AlpacaCleaned * WebglmQA ### Training Procedure * Learning rate: 2e-5 cosine * Optimizer: PagedLion8bit * QLora: rank: 64 /Q: 4-bit - 250k examples of 70% Vietnamese 30% English for 3.37 epoch - 350k examples of 60% Vietnamese 40% English for 1.4 epoch ### Training loss ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63905e87df447b438817b2cd/rV8Go_YFZv7QcR_FhFxp-.png) ## Evaluation ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63905e87df447b438817b2cd/z1ZTm7Tab4tQbVPgQW1hU.png) Our model currently sits at TOP-5 on the VMLU benchmark ## Citation ## Model Card Authors ## Model Card Contact [More Information Needed]