RuterNorway
/

Llama-2-13b-chat-norwegian-LoRa

@@ -16,6 +16,8 @@ datasets:
 # Llama 2 13b Chat Norwegian LoRA adaptor
 **This is the LoRA adaptor for the Llama 2 13b Chat Norwegian model, and requires the original base model to run**
 Llama-2-13b-chat-norwegian is a variant of [Meta](https://huggingface.co/meta-llama)´s [Llama 2 13b Chat](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) model, finetuned on a mix of norwegian datasets created in [Ruter AI Lab](https://ruter.no) the summer of 2023.
 The model is tuned to understand and generate text in Norwegian. It's trained for one epoch on norwegian-alpaca + 15000 samples of machine-translated data from OpenOrca (the dataset to be released). A small subset of custom-made instructional data is also included.
@@ -144,4 +146,49 @@ Llama 2 is a new technology that carries risks with use. Testing conducted to da
 For these reasons, as with all LLMs, Llama 2’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts.
 Therefore, before deploying any applications of Llama 2, developers should perform safety testing and tuning tailored to their specific applications of the model.
 Please see the Responsible Use Guide available at https://ai.meta.com/llama/responsible-use-guide/
 ```

 # Llama 2 13b Chat Norwegian LoRA adaptor
 **This is the LoRA adaptor for the Llama 2 13b Chat Norwegian model, and requires the original base model to run**
+For a demo inference script, look at the end of this file.
 Llama-2-13b-chat-norwegian is a variant of [Meta](https://huggingface.co/meta-llama)´s [Llama 2 13b Chat](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) model, finetuned on a mix of norwegian datasets created in [Ruter AI Lab](https://ruter.no) the summer of 2023.
 The model is tuned to understand and generate text in Norwegian. It's trained for one epoch on norwegian-alpaca + 15000 samples of machine-translated data from OpenOrca (the dataset to be released). A small subset of custom-made instructional data is also included.
 For these reasons, as with all LLMs, Llama 2’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts.
 Therefore, before deploying any applications of Llama 2, developers should perform safety testing and tuning tailored to their specific applications of the model.
 Please see the Responsible Use Guide available at https://ai.meta.com/llama/responsible-use-guide/
+```
+# Demo script
+This is a minimal example of how to use the adapter for inference
+```python
+import torch
+from peft import PeftModel
+from transformers import LlamaTokenizer, LlamaForCausalLM
+MODEL = 'meta-llama/Llama-2-13b-chat-hf'
+ADAPTER = 'RuterNorway/Llama-2-13b-chat-norwegian-LoRa'
+HF_TOKEN = '...'
+prompt = """
+### Instruction
+Hva heter du?
+### Answer
+"""
+tokenizer = LlamaTokenizer.from_pretrained(MODEL, legacy=False, use_auth_token=HF_TOKEN)
+base_model = LlamaForCausalLM.from_pretrained(
+    MODEL,
+    device_map='auto',
+    load_in_8bit=True,
+    torch_dtype=torch.float16,
+    use_auth_token=HF_TOKEN,
+)
+model = PeftModel.from_pretrained(
+    base_model, ADAPTER, torch_dtype=torch.float16, is_trainable=False
+)
+with torch.no_grad():
+    output_tensors = model.generate(
+        input_ids=tokenizer(prompt, return_tensors="pt").input_ids.cuda(),
+        max_new_tokens=128
+    )[0]
+tokenizer.decode(output_tensors, skip_special_tokens=True).split('### Answer')[-1]
 ```