OpenAssistant
/

llama2-13b-orca-8k-3319

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

andreaskoepf commited on Jul 24, 2023

Commit

ced264d

·

1 Parent(s): 3d4a919

add usage example written by Jordi

Files changed (1) hide show

README.md +21 -0

README.md CHANGED Viewed

@@ -17,11 +17,32 @@ widget:
 ---
 # llama2-13b-orca-8k-3319
 This model is a fine-tuning of Meta's Llama2 13B model with 8K context size on a long-conversation variant of the Dolphin dataset ([orca-chat](https://huggingface.co/datasets/shahules786/orca-chat)).
 Note: **At least Huggingface Transformers [4.31.0](https://pypi.org/project/transformers/4.31.0/) is required to load this model!**
 - base model: [meta-llama/Llama-2-7b](https://huggingface.co/meta-llama/Llama-2-7b)
 - License: [Llama 2 Community License Agreement](https://ai.meta.com/resources/models-and-libraries/llama-downloads/)
 - sampling report: TBD

 ---
 # llama2-13b-orca-8k-3319
+## Model Description
 This model is a fine-tuning of Meta's Llama2 13B model with 8K context size on a long-conversation variant of the Dolphin dataset ([orca-chat](https://huggingface.co/datasets/shahules786/orca-chat)).
 Note: **At least Huggingface Transformers [4.31.0](https://pypi.org/project/transformers/4.31.0/) is required to load this model!**
+## Usage
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("OpenAssistant/llama2-13b-orca-8k-3319", use_fast=False)
+model = AutoModelForCausalLM.from_pretrained("OpenAssistant/llama2-13b-orca-8k-3319", torch_dtype=torch.float16, low_cpu_mem_usage=True, device_map="auto")
+system_message = "You are an AI assistant. Provide a detailed answer so user don’t need to search outside to understand the answer."
+user_prompt = "Write me a poem please"
+prompt = f"""<|system|>{system_message}</s><|prompter|>{user_prompt}</s><|assistant|>"""
+inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
+output = model.generate(**inputs, do_sample=True, top_p=0.95, top_k=0, max_new_tokens=256)
+print(tokenizer.decode(output[0], skip_special_tokens=True))
+```
+## Model Details
 - base model: [meta-llama/Llama-2-7b](https://huggingface.co/meta-llama/Llama-2-7b)
 - License: [Llama 2 Community License Agreement](https://ai.meta.com/resources/models-and-libraries/llama-downloads/)
 - sampling report: TBD