--- language: - ko - en library_name: transformers tags: - trl - sft widget: - text: 안녕 --- # Yokhal (욕쟁이 할머니) Korean Chatbot based on Google Gemma ## Model Details ### Model Description - **Fine-tuned by:** Alan Jo - **Model type:** Gemma - **Language(s) (NLP):** Korean, English - **Finetuned from model:** [Gemma-2b-it](https://huggingface.co/google/gemma-2b-it) ### Model Sources - **Repository:** https://github.com/seonglae/yokhal - **Demo:** https://huggingface.co/spaces/seonglae/yokhal ## Uses ### Direct Use Korean Chatbot with Internet culture ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. ```py tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto" if device is None else device, attn_implementation="flash_attention_2") # if flash enabled sys_prompt = '한국어로 대답해' texts = ['안녕', '서울은 오늘 어때'] chats = list(map(lambda t: [{'role': 'user', 'content': f'{sys_prompt}\n{t}'}], texts)) # ChatML format prompts = list(map(lambda p: tokenizer.apply_chat_template(p, tokenize=False, add_generation_prompt=True), chats)) input_ids = tokenizer(prompts, return_tensors="pt", padding=True).to("cuda" if device is None else device) outputs = model.generate(**input_ids, max_new_tokens=100, repetition_penalty=1.05) for output in outputs: print(tokenizer.decode(output, skip_special_tokens=True), end='\n\n') ``` ## Training Details Trained on 2 x RTX3090 [More Information on Github source code](https://github.com/seonglae/yokhal/blob/master/train.py) ### Training Data [More Information Needed] ### Training Procedure 1. Weight Initialized from Internet comments dataset 2. Trained on Korean Namuwiki dataset until step 80000 (30000 step is on main branch because of repetition issue above there) - `seq_length` 1024 with dataset packing - `batch` 3 per device - `lr` 1e-5 - `optim` adafactor 3. Instruction tuning on Korean Instruction Dataset using QLoRa (not on main) - `seq_length` 2048 - `lr` 2e-4 #### Preprocessing [optional] Gemma do not support explicit system prompt in ChatML, so I trained putting system prompt before user message like below ```py if (chat[0]['role'] == 'system'): chat[1]['content'] = f"{chat[0]['content']}\n{chat[1]['content']}" chat = chat[1:] try: prompt = tokenizer.apply_chat_template(chat, tokenize=False) ``` [Source Code](https://github.com/seonglae/yokhal/blob/master/yokhal/adapt.py) #### Training Hyperparameters - **Training regime:** [More Information Needed] #### Speeds, Sizes, Times [optional] [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary