huihui-ai/r1-1776-GGUF

This model converted from r1-1776 to GGUF, Even GPUs with a minimum memory of 8 GB can try it.

GGUF: Q2_K, Q3_K_M, Q4_K_M, Q8_0 all support.

BF16 to f16.gguf

  1. Download perplexity-ai/r1-1776 model, requires approximately 1.21TB of space.
cd /home/admin/models
huggingface-cli download perplexity-ai/r1-1776 --local-dir ./perplexity-ai/r1-1776
  1. Use the llama.cpp conversion program to convert r1-1776 to gguf format, requires an additional approximately 1.22 TB of space.
python convert_hf_to_gguf.py /home/admin/models/perplexity-ai/r1-1776 --outfile /home/admin/models/perplexity-ai/r1-1776/ggml-model-f16.gguf --outtype f16
  1. Use the llama.cpp quantitative program to quantitative model (llama-quantize needs to be compiled.), other quant option. Convert first Q2_K, requires an additional approximately 227 GB of space.
llama-quantize /home/admin/models/perplexity-ai/r1-1776/ggml-model-f16.gguf  /home/admin/models/perplexity-ai/r1-1776/ggml-model-Q2_K.gguf Q2_K
  1. Use llama-cli to test.
llama-cli -m /home/admin/models/perplexity-ai/r1-1776/ggml-model-Q2_K.gguf -n 2048

Use with ollama

You can use huihui_ai/perplexity-ai-r1 directly

ollama run huihui_ai/perplexity-ai-r1:671b-q2_K

or huihui_ai/perplexity-ai-r1:671b-q3_K_M

ollama run perplexity-ai-r1:671b-q3_K_M

Modefile

The Model file is based on ggml-model-Q2_K.gguf.

A single GPU with 24GB of memory can hold 4 layers of data, and num_gpu is set to 4

If there are 8 GPUs with 24GB of GPU memory each, num_gpu can be 32. The value of this parameter can be set to ollama.

But the model suggests setting the minimum value of num_gpu to 1, which can be changed by setting parameters later.

The specific parameters can be changed according to your own tests.

The value of num_gpu can be adjusted based on the number of GPUs and the GPU memory size available.

  1. Modify Modelfile
FROM perplexity-ai/r1-1776/ggml-model-Q2_K.gguf
TEMPLATE """{{- if .System }}{{ .System }}{{ end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1}}
{{- if eq .Role "user" }}<|User|>{{ .Content }}
{{- else if eq .Role "assistant" }}<|Assistant|>{{ .Content }}{{- if not $last }}<|end▁of▁sentence|>{{- end }}
{{- end }}
{{- if and $last (ne .Role "assistant") }}<|Assistant|>{{- end }}
{{- end }}"""
PARAMETER stop <|begin▁of▁sentence|>
PARAMETER stop <|end▁of▁sentence|>
PARAMETER stop <|User|>
PARAMETER stop <|Assistant|>
PARAMETER num_gpu 1
  1. Use ollama create to then create the quantized model.
ollama create -f Modelfile huihui_ai/perplexity-ai-r1:671b-q2_K

image/png

  1. Run model
ollama run huihui_ai/perplexity-ai-r1:671b-q2_K

image/png

  1. Set parameters before asking the question.

The /set parameter for ollama can cause a model reload on each occasion.

"num_thread" refers to the number of cores in your computer, and it's recommended to use half of that, Otherwise, the CPU will be at 100%.

"num_ctx" for ollama refers to the number of context slots or the number of contexts the model can maintain during inference.

/set parameter num_thread 32
/set parameter num_ctx 2048

image/png If it's an 8-card (GPU, 24GB)configuration, you also need to set the num_gpu parameter.

/set parameter num_gpu 32

image/png

The above three parameters should be set one at a time, and do not send them all to Ollama at once.

  1. Q2.K.gguf is now available for download. If you want to merge the weights together, use this script:
llama-gguf-split --merge Q2_K-GGUF/r1-1776-Q2_K-00001-of-00005.gguf r1-1776-q2_K.gguf
  1. Q3_K_M, Q4_K_M, Q8_0 also supports it, and it will likely need at least 12GB of memory.

  2. Q8_0 also supports it, and it will likely need at least 24GB of memory.

Donation

If you like it, please click 'like' and follow us for more updates.

Your donation helps us continue our further development and improvement, a cup of coffee can do it.
  • bitcoin:
  bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
Downloads last month
531
GGUF
Model size
671B params
Architecture
deepseek2
Hardware compatibility
Log In to view the estimation

2-bit

3-bit

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for huihui-ai/r1-1776-GGUF

Quantized
(12)
this model