YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Quantization made by Richard Erkhov.
Yi-34Bx2-MOE-200K - GGUF
- Model creator: https://huggingface.co/cloudyu/
- Original model: https://huggingface.co/cloudyu/Yi-34Bx2-MOE-200K/
Name | Quant method | Size |
---|---|---|
Yi-34Bx2-MOE-200K.Q2_K.gguf | Q2_K | 20.86GB |
Yi-34Bx2-MOE-200K.IQ3_XS.gguf | IQ3_XS | 23.26GB |
Yi-34Bx2-MOE-200K.IQ3_S.gguf | IQ3_S | 24.56GB |
Yi-34Bx2-MOE-200K.Q3_K_S.gguf | Q3_K_S | 24.51GB |
Yi-34Bx2-MOE-200K.IQ3_M.gguf | IQ3_M | 25.2GB |
Yi-34Bx2-MOE-200K.Q3_K.gguf | Q3_K | 27.23GB |
Yi-34Bx2-MOE-200K.Q3_K_M.gguf | Q3_K_M | 27.23GB |
Yi-34Bx2-MOE-200K.Q3_K_L.gguf | Q3_K_L | 29.59GB |
Yi-34Bx2-MOE-200K.IQ4_XS.gguf | IQ4_XS | 30.58GB |
Yi-34Bx2-MOE-200K.Q4_0.gguf | Q4_0 | 31.98GB |
Yi-34Bx2-MOE-200K.IQ4_NL.gguf | IQ4_NL | 32.27GB |
Yi-34Bx2-MOE-200K.Q4_K_S.gguf | Q4_K_S | 32.22GB |
Yi-34Bx2-MOE-200K.Q4_K.gguf | Q4_K | 34.14GB |
Yi-34Bx2-MOE-200K.Q4_K_M.gguf | Q4_K_M | 34.14GB |
Yi-34Bx2-MOE-200K.Q4_1.gguf | Q4_1 | 35.49GB |
Yi-34Bx2-MOE-200K.Q5_0.gguf | Q5_0 | 39.0GB |
Yi-34Bx2-MOE-200K.Q5_K_S.gguf | Q5_K_S | 39.0GB |
Yi-34Bx2-MOE-200K.Q5_K.gguf | Q5_K | 40.12GB |
Yi-34Bx2-MOE-200K.Q5_K_M.gguf | Q5_K_M | 40.12GB |
Yi-34Bx2-MOE-200K.Q5_1.gguf | Q5_1 | 42.51GB |
Yi-34Bx2-MOE-200K.Q6_K.gguf | Q6_K | 46.47GB |
Yi-34Bx2-MOE-200K.Q8_0.gguf | Q8_0 | 60.18GB |
Original model description:
license: other
try to build a 200K context length MoE Yi based chat model.
[[GGUF 4bit is here ]https://huggingface.co/cloudyu/Yi-34Bx2-MOE-200K-gguf ]
metrics :
code example
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import math
##
model_path = "cloudyu/Yi-34Bx2-MOE-200K"
tokenizer = AutoTokenizer.from_pretrained(model_path, use_default_system_prompt=False)
model = AutoModelForCausalLM.from_pretrained(
model_path, torch_dtype=torch.float32, device_map='auto',local_files_only=False, load_in_4bit=True
)
print(model)
prompt = input("please input prompt:")
while len(prompt) > 0:
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")
generation_output = model.generate(
input_ids=input_ids, max_new_tokens=500,repetition_penalty=1.2
)
print(tokenizer.decode(generation_output[0]))
prompt = input("please input prompt:")
- Downloads last month
- 221