jetmoe
/

jetmoe-8b-chat

Text Generation

alignment-handbook

Generated from Trainer

Model card Files Files and versions Community

zguo0525 commited on Apr 2

Commit

185cefc

•

1 Parent(s): 2c07cbf

Update README.md

Files changed (1) hide show

README.md +41 -1

README.md CHANGED Viewed

@@ -19,4 +19,44 @@ Welcome to the official repository of JetMoE-8B-chat, a language model that comb
 | Llama-2-13b-chat    | 6.650     |
 | Vicuna-13b-v1.3     | 6.413     |
 | Wizardlm-13b        | 6.353     |
-| Llama-2-7b-chat     | 6.269     |

 | Llama-2-13b-chat    | 6.650     |
 | Vicuna-13b-v1.3     | 6.413     |
 | Wizardlm-13b        | 6.353     |
+| Llama-2-7b-chat     | 6.269     |
+### Usage
+Here's a quick example to get you started with JetMoE-8B-chat:
+```python
+import torch
+from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
+# Initialize the model and tokenizer
+model_name = "jetmoe/jetmoe-8b-chat"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, attn_implementation="eager", trust_remote_code=True)
+# Check if a GPU is available and move the model to GPU if it is
+if torch.cuda.is_available():
+    model = model.cuda()
+    print("Using GPU:", torch.cuda.get_device_name(torch.cuda.current_device()))
+else:
+    print("GPU is not available, using CPU instead.")
+# Encode input context
+messages = [
+    {
+        "role": "system",
+        "content": "You are a friendly chatbot",
+    },
+    {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
+ ]
+tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
+print(tokenized_chat)
+# If using a GPU, move the input IDs to the GPU
+if torch.cuda.is_available():
+    input_ids = tokenized_chat.cuda()
+# Generate text
+output = model.generate(input_ids, max_length=500, num_return_sequences=1, no_repeat_ngram_size=2)
+# If the output is on the GPU, move it back to CPU for decoding
+if torch.cuda.is_available():
+    output = output.cpu()
+# Decode the generated text
+generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
+print(generated_text)
+```