Instructions to use deepcogito/cogito-v1-preview-llama-70B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use deepcogito/cogito-v1-preview-llama-70B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="deepcogito/cogito-v1-preview-llama-70B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("deepcogito/cogito-v1-preview-llama-70B") model = AutoModelForCausalLM.from_pretrained("deepcogito/cogito-v1-preview-llama-70B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use deepcogito/cogito-v1-preview-llama-70B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "deepcogito/cogito-v1-preview-llama-70B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "deepcogito/cogito-v1-preview-llama-70B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/deepcogito/cogito-v1-preview-llama-70B
- SGLang
How to use deepcogito/cogito-v1-preview-llama-70B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "deepcogito/cogito-v1-preview-llama-70B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "deepcogito/cogito-v1-preview-llama-70B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "deepcogito/cogito-v1-preview-llama-70B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "deepcogito/cogito-v1-preview-llama-70B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use deepcogito/cogito-v1-preview-llama-70B with Docker Model Runner:
docker model run hf.co/deepcogito/cogito-v1-preview-llama-70B
LM Studio prompt template for MLX quants.
For those who are struggling with launching the LM Studio MLX quantized version of it, here is the Gemini rewritten template, that works just fine:
{{- bos_token }}
{%- if not tools is defined %}
{%- set tools = none %}
{%- endif %}
{%- if not enable_thinking is defined %}
{%- set enable_thinking = false %}
{%- endif %}
{%- if messages[0]['role'] == 'system' %}
{%- set system_message = messages[0]['content']|trim %}
{%- set messages = messages[1:] %}
{%- else %}
{%- set system_message = "" %}
{%- endif %}
{% set has_system_content = (system_message != '') or (tools is not none) or enable_thinking %}
{% if has_system_content %}
{{- "<|start_header_id|>system<|end_header_id|> \n\n" }}
{% if enable_thinking %}
{{- "Enable deep thinking subroutine." }}
{% if system_message != '' or tools is not none %}
{{- " \n\n" }}
{% endif %}
{% endif %}
{% if system_message != '' %}
{{- system_message }}
{% if tools is not none %}
{{- " \n\n" }}
{% endif %}
{% endif %}
{% if tools is not none %}
{{- "Available Tools: \n" }}
{% for t in tools %}
{{- t | tojson(indent=4) }}
{{- " \n\n" }}
{% endfor %}
{% endif %}
{{- "<|eot_id|>" }}
{% endif %}
{%- for message in messages %}
{%- if not (message.role == "ipython" or message.role == "tool" or message.role == "tool_results" or (message.tool_calls is defined and message.tool_calls is not none)) %}
{{- '<|start_header_id|>' + message['role'] + '<|end_header_id|> \n\n' }}
{%- if message['content'] is string %}
{{- message['content'] | trim }}
{%- else %}
{%- for item in message['content'] %}
{%- if item.type == 'text' %}
{{- item.text | trim }}
{%- endif %}
{%- endfor %}
{%- endif %}
{{- '<|eot_id|>' }}
{%- elif message.tool_calls is defined and message.tool_calls is not none %}
{{- "<|start_header_id|>assistant<|end_header_id|> \n\n" }}
{%- if message['content'] is string %}
{{- message['content'] | trim }}
{%- else %}
{%- for item in message['content'] %}
{%- if item.type == 'text' %}
{{- item.text | trim }}
{%- if item.text | trim != "" %}
{{- " \n\n" }}
{%- endif %}
{%- endif %}
{%- endfor %}
{%- endif %}
{{- "[" }}
{%- for tool_call in message.tool_calls %}
{%- set out = tool_call.function|tojson %}
{%- if not tool_call.id is defined %}
{{- out }}
{%- else %}
{{- out[:-1] }}
{{- ', "id": "' + tool_call.id + '"}' }}
{%- endif %}
{%- if not loop.last %}
{{- ", " }}
{%- else %}
{{- "]<|eot_id|>" }}
{%- endif %}
{%- endfor %}
{%- elif message.role == "ipython" or message["role"] == "tool_results" or message["role"] == "tool" %}
{{- "<|start_header_id|>ipython<|end_header_id|> \n\n" }}
{%- if message.tool_call_id is defined and message.tool_call_id != '' %}
{{- '{"content": ' + (message.content | tojson) + ', "call_id": "' + message.tool_call_id + '"}' }}
{%- else %}
{{- '{"content": ' + (message.content | tojson) + '}' }}
{%- endif %}
{{- "<|eot_id|>" }}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|start_header_id|>assistant<|end_header_id|> \n\n' }}
{%- endif %}
For those who are struggling with launching the LM Studio version of it, here is the Gemini rewritten template:
I tried the smaller 14B and 32B (GGUF versions from Bartowski) and they run fine with no template issues. Do you mean issues with enabling CoT? That's easy, you just have to put "Enable deep thinking subroutine." (without quotes) into system prompt. I think it's pretty neat that you can use that sentence like a switch to turn thinking process on, otherwise it works like a regular model.
For those who are struggling with launching the LM Studio version of it, here is the Gemini rewritten template:
I tried the smaller 14B and 32B (GGUF versions from Bartowski) and they run fine with no template issues. Do you mean issues with enabling CoT? That's easy, you just have to put "Enable deep thinking subroutine." (without quotes) into system prompt. I think it's pretty neat that you can use that sentence like a switch to turn thinking process on, otherwise it works like a regular model.
In my case, the issue was with MLX quant. I believe that is the important context. Thanks.
That's easy, you just have to put "Enable deep thinking subroutine." (without quotes) into system prompt. I think it's pretty neat that you can use that sentence like a switch to turn thinking process on, otherwise it works like a regular model.
That's crazy easy. I'm using Bartowski's llama 8B on my poor old GPU, and have been messing with stuff I don't understand for hours. Thank you!!