Spestly's picture
Update README.md
015d530 verified
metadata
language:
  - en
  - fr
  - de
  - es
  - it
  - pt
  - zh
  - ja
  - ru
  - ko
license: other
license_name: mrl
license_link: https://mistral.ai/licenses/MRL-0.1.md
base_model:
  - mistralai/Mistral-Nemo-Instruct-2407
library_name: transformers
tags:
  - reasoning
  - hybrid
  - gemini-2.0
  - deepseek-r1
  - synthetic data
  - unsloth
  - trl
  - hybrid

Header

DeepNeo: A hybrid model with precision and power

Overview

DeepNeo is a hybrid model that can be used like any other LLM, but DeepNeo has a mode that is inspired by NousResearch/DeepHermes-3-Llama-3-8B-Preview, which allows the model to activate a CoT-like response. This is done by toggling the system prompt. Unlike NousResearch/DeepHermes-3-Llama-3-8B-Preview, DeepNeo is slightly more flexible in its sizes. We have introduced an 8B and 12B model; both of them are based on Mistral AI's models

Model Details

DeepNeo 12B Key features

  • Developed by: Spestly (Open-Neo) & Kazex (Open-Neo)
  • Released under the Mistral Research License, reach out to Mistral AI for a commercial license
  • Trained with a 128k context window
  • Trained on a large proportion of multilingual and synthetic reasoning data
  • Supports function calling
Feature Value
Architecture Dense Transformer
Parameters ~12B
Layers 40
Heads 32
KV Heads (GQA) 8
Hidden Dim 14336
Head Dim 128
Vocab Size 131,072
Context Length 128k
Attention Pattern Ragged (128k,32k,32k,32k)

Usage

Intuitive mode

By default, this mode is activated, and you do not need to change anything. This means you are allowed to use any system prompt! We have given an example below.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("")
model = AutoModelForCausalLM.from_pretrained(
    "open-neo/DeepNeo-1-12B-Preview",
    torch_dtype=torch.float16,
    device_map="auto"
)

messages = [
    {"role": "user", "content": "What are the most interesting things to do in Paris?"}
]

input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, do_sample=True)

print(f"Generated Tokens: {generated_ids.shape[-1]}")
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(f"Response: {response}")

Reasoning mode

To activate this mode, we need to do some extra steps. Almost all system instructions should work as long as they mention <Thought></Thought> and <Output></Output>. An example of this system prompt is given below. Please note that it may require tweaking for your specific use case.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("")
model = AutoModelForCausalLM.from_pretrained(
    "open-neo/DeepNeo-1-12B-Preview",
    torch_dtype=torch.float16,
    device_map="auto"
)

messages = [
    {"role": "system", "content": "You are a deep-thinking AI model. You must put your thoughts in the <Thought> tags and your output in the <Output> tags."},
    {"role": "user", "content": "What are the most interesting things to do in Paris?"}
]

input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, do_sample=True)

print(f"Generated Tokens: {generated_ids.shape[-1]}")
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(f"Response: {response}")

Citations

@misc{deepneo-1,
      title={DeepNeo: A hybrid model with precision and power}, 
      author={Aayan Mishra and Krish Thumar},
      howpublished={https://huggingface.co/collections/open-neo/deepneo-1-67aea4c0f086ab0f70ed5720},
      year={2025}
}