PrunaAI
/

Mixtral-8x22B-v0.1-bnb-4bit-smashed

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

johnrachwanpruna commited on Apr 10, 2024

Commit

47c3ea4

·

verified ·

1 Parent(s): 4ac84bb

Update README.md

Files changed (1) hide show

README.md +9 -1

README.md CHANGED Viewed

@@ -64,8 +64,16 @@ You can run the smashed model with these steps:
       tokenizer = AutoTokenizer.from_pretrained("PrunaAI/Mixtral-8x22B-v0.1-bnb-4bit-smashed")
       model = AutoModelForCausalLM.from_pretrained(
-        "PrunaAI/Mixtral-8x22B-v0.1-bnb-4bit-smashed",
       )
     ```

       tokenizer = AutoTokenizer.from_pretrained("PrunaAI/Mixtral-8x22B-v0.1-bnb-4bit-smashed")
       model = AutoModelForCausalLM.from_pretrained(
+      "PrunaAI/Mixtral-8x22B-v0.1-bnb-4bit-smashed",
+      device_map="sequential",
+      torch_dtype=torch.bfloat16,
       )
+      text = "Who is Einstein?"
+      inputs = tokenizer(text, return_tensors="pt")
+      outputs = model.generate(**inputs, max_new_tokens=20)
+      print(tokenizer.decode(outputs[0], skip_special_tokens=True))
     ```