|
--- |
|
library_name: peft |
|
base_model: LSX-UniWue/LLaMmlein_1B |
|
tags: |
|
- trl |
|
- sft |
|
- generated_from_trainer |
|
model-index: |
|
- name: LLaMmlein_1b_chat_all |
|
results: [] |
|
datasets: |
|
- LSX-UniWue/Guanako |
|
- FreedomIntelligence/sharegpt-deutsch |
|
- FreedomIntelligence/alpaca-gpt4-deutsch |
|
language: |
|
- de |
|
license: other |
|
--- |
|
|
|
# LLäMmlein 1B Chat |
|
|
|
This is a chat adapter for the German Tinyllama 1B language model. |
|
Find more details on our [page](https://www.informatik.uni-wuerzburg.de/datascience/projects/nlp/llammlein/) and our [preprint](arxiv.org/abs/2411.11171)! |
|
|
|
## Run it |
|
```py |
|
import torch |
|
from peft import PeftConfig, PeftModel |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
torch.manual_seed(42) |
|
|
|
# script config |
|
base_model_name = "LSX-UniWue/LLaMmlein_1B" |
|
chat_adapter_name = "LSX-UniWue/LLaMmlein_1B_chat_selected" |
|
device = "mps" # or cuda |
|
|
|
# chat history |
|
messages = [ |
|
{ |
|
"role": "user", |
|
"content": """Na wie geht's?""", |
|
}, |
|
] |
|
|
|
# load model |
|
config = PeftConfig.from_pretrained(chat_adapter_name) |
|
base_model = model = AutoModelForCausalLM.from_pretrained( |
|
base_model_name, |
|
attn_implementation="flash_attention_2" if device == "cuda" else None, |
|
torch_dtype=torch.bfloat16, |
|
device_map=device, |
|
) |
|
base_model.resize_token_embeddings(32064) |
|
model = PeftModel.from_pretrained(base_model, chat_adapter_name) |
|
tokenizer = AutoTokenizer.from_pretrained(chat_adapter_name) |
|
|
|
# encode message in "ChatML" format |
|
chat = tokenizer.apply_chat_template( |
|
messages, |
|
return_tensors="pt", |
|
add_generation_prompt=True, |
|
).to(device) |
|
|
|
# generate response |
|
print( |
|
tokenizer.decode( |
|
model.generate( |
|
chat, |
|
max_new_tokens=300, |
|
pad_token_id=tokenizer.pad_token_id, |
|
eos_token_id=tokenizer.eos_token_id, |
|
)[0], |
|
skip_special_tokens=False, |
|
) |
|
) |
|
|
|
``` |