--- library_name: peft base_model: LSX-UniWue/LLaMmlein_1B tags: - trl - sft - generated_from_trainer model-index: - name: LLaMmlein_1b_chat_all results: [] datasets: - LSX-UniWue/Guanako - FreedomIntelligence/sharegpt-deutsch - FreedomIntelligence/alpaca-gpt4-deutsch language: - de license: other --- # LLäMmlein 1B Chat This is a chat adapter for the German Tinyllama 1B language model. Find more details on our [page](https://www.informatik.uni-wuerzburg.de/datascience/projects/nlp/llammlein/) and our [preprint](arxiv.org/abs/2411.11171)! ## Run it ```py import torch from peft import PeftConfig, PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer torch.manual_seed(42) # script config base_model_name = "LSX-UniWue/LLaMmlein_1B" chat_adapter_name = "LSX-UniWue/LLaMmlein_1B_chat_selected" device = "mps" # or cuda # chat history messages = [ { "role": "user", "content": """Na wie geht's?""", }, ] # load model config = PeftConfig.from_pretrained(chat_adapter_name) base_model = model = AutoModelForCausalLM.from_pretrained( base_model_name, attn_implementation="flash_attention_2" if device == "cuda" else None, torch_dtype=torch.bfloat16, device_map=device, ) base_model.resize_token_embeddings(32064) model = PeftModel.from_pretrained(base_model, chat_adapter_name) tokenizer = AutoTokenizer.from_pretrained(chat_adapter_name) # encode message in "ChatML" format chat = tokenizer.apply_chat_template( messages, return_tensors="pt", add_generation_prompt=True, ).to(device) # generate response print( tokenizer.decode( model.generate( chat, max_new_tokens=300, pad_token_id=tokenizer.pad_token_id, eos_token_id=tokenizer.eos_token_id, )[0], skip_special_tokens=False, ) ) ```