Instructions to use norallm/normistral-11b-thinking-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use norallm/normistral-11b-thinking-gguf with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="norallm/normistral-11b-thinking-gguf") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("norallm/normistral-11b-thinking-gguf", dtype="auto") - llama-cpp-python
How to use norallm/normistral-11b-thinking-gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="norallm/normistral-11b-thinking-gguf", filename="normistral-11B-thinking-BF16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use norallm/normistral-11b-thinking-gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf norallm/normistral-11b-thinking-gguf:Q4_K_M # Run inference directly in the terminal: llama-cli -hf norallm/normistral-11b-thinking-gguf:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf norallm/normistral-11b-thinking-gguf:Q4_K_M # Run inference directly in the terminal: llama-cli -hf norallm/normistral-11b-thinking-gguf:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf norallm/normistral-11b-thinking-gguf:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf norallm/normistral-11b-thinking-gguf:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf norallm/normistral-11b-thinking-gguf:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf norallm/normistral-11b-thinking-gguf:Q4_K_M
Use Docker
docker model run hf.co/norallm/normistral-11b-thinking-gguf:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use norallm/normistral-11b-thinking-gguf with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "norallm/normistral-11b-thinking-gguf" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "norallm/normistral-11b-thinking-gguf", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/norallm/normistral-11b-thinking-gguf:Q4_K_M
- SGLang
How to use norallm/normistral-11b-thinking-gguf with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "norallm/normistral-11b-thinking-gguf" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "norallm/normistral-11b-thinking-gguf", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "norallm/normistral-11b-thinking-gguf" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "norallm/normistral-11b-thinking-gguf", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use norallm/normistral-11b-thinking-gguf with Ollama:
ollama run hf.co/norallm/normistral-11b-thinking-gguf:Q4_K_M
- Unsloth Studio new
How to use norallm/normistral-11b-thinking-gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for norallm/normistral-11b-thinking-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for norallm/normistral-11b-thinking-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for norallm/normistral-11b-thinking-gguf to start chatting
- Docker Model Runner
How to use norallm/normistral-11b-thinking-gguf with Docker Model Runner:
docker model run hf.co/norallm/normistral-11b-thinking-gguf:Q4_K_M
- Lemonade
How to use norallm/normistral-11b-thinking-gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull norallm/normistral-11b-thinking-gguf:Q4_K_M
Run and chat with the model
lemonade run user.normistral-11b-thinking-gguf-Q4_K_M
List all available models
lemonade list
This is our instruction-tuned NorMistral-11B language model for Norwegian, trained on open datasets and released under Apache 2.0 license. The model has undergone extensive fluency-preserving reinforcement learning according to our paper Fluent Alignment with Disfluent Judges: Post-training for Lower-resource Languages.
The repository contains GGUF files with different amounts of quantization. We also provide a working .modelfile, which contains the official chat template converted to Go (as used by llama.cpp and ollama). The F16 model is also available for direct use at https://ollama.com/LTG/normistral-11b-thinking.
The model is freely available in our public chat interface: https://chat.llm.sigma2.no/
License
We release the model under Apache 2.0 license to indicate that we do not impose any additional constraints on the model weights. However, we do not own the data in the training collection.
Training and data
Generally speaking, the training follows our fluency-preserving post-training setup from Fluent Alignment with Disfluent Judges: Post-training for Lower-resource Languages.
The training data is published alongside the model at norallm/normistral-11b-thinking-training. Training code will be available at github.com/ltgoslo/normistral-post-training.
1. Supervised finetuning (SFT)
We start by "injecting" the instruction-following and reasoning capabilities by SFT training on English responses and reasoning traces from Kimi-K2-Thinking. The full SFT collection is published in train_sft.jsonl.
2. Reinforcement learning (d-RLAIF)
The short SFT stage is followed by on-policy training on a large collection of Norwegian (Bokmål and Nynorsk) prompts (also available at norallm/normistral-11b-thinking-training). The specific setup of d-RLAIF (direct reinforcement learning from AI feedback) and its motivation is extensively described in our paper. The "AI" reward model used here is Mistral-Large-Instruct-2411.
Evaluation
We compared NorMistral against state-of-the-art instruction-tuned models of similar size. What follows is a preliminary evaluation on a generative version of NorEval (that is still work-in-progress). The responses from all evaluated models below are fully available for closer inspection at norallm/normistral-11b-thinking-evaluation.
Classification tasks
All classification scores are reported as accuracy. NoReC sentiment analysis is done on sentence level. The generative scores (NorRewrite and Norsummarize) are reported as the average win-rates against Llama-3.1-8B evaluated using LLM-as-a-judge setup with Llama-3.3-70B (see NorEval for more information). * denotes "thinking" models.
| Model | NoReC_binary | NoReC_ternary | NorIdiom_NB | NorIdiom_NN | NorCSQA_NB | NorCSQA_NN |
|---|---|---|---|---|---|---|
| NorMistral-11B* | 86.3 | 65.2 | 55.7 | 27.7 | 70.7 | 64.2 |
| Llama-3.1-8B | 79.8 | 52.9 | 12.7 | 6.7 | 64.0 | 57.9 |
| Mistral-Nemo-12B | 67.9 | 49.1 | 12.9 | 8.5 | 61.6 | 49.5 |
| Qwen3-15B* | 83.5 | 69.6 | 22.1 | 13.2 | 83.8 | 71.6 |
| Gemma3-12B | 85.2 | 67.1 | 43.7 | 23.7 | 81.9 | 80.0 |
| OLMo3-7B* | 72.0 | 63.3 | 5.0 | 2.2 | 50.8 | 17.9 |
| OLMo2-13B | 32.8 | 13.2 | 3.5 | 2.2 | 48.0 | 45.3 |
| Apertus-8B | 78.4 | 58.8 | 34.3 | 15.7 | 69.2 | 63.2 |
| Model | NorOBQA_NB | NorOBQA_NN | NRK_NB | NRK_NN | NorRewrite | NorSummarize |
|---|---|---|---|---|---|---|
| NorMistral-11B* | 83.0 | 84.4 | 58.8 | 62.3 | 51.9 | 54.3 |
| Llama-3.1-8B | 78.5 | 71.1 | 49.8 | 46.2 | 50.0 | 50.0 |
| Mistral-Nemo-12B | 75.3 | 67.8 | 47.3 | 45.0 | 42.5 | 39.2 |
| Qwen3-15B* | 94.4 | 88.9 | 63.3 | 55.9 | 77.6 | 83.1 |
| Gemma3-12B | 91.5 | 88.9 | 59.8 | 58.4 | 86.8 | 77.8. |
| OLMo3-7B* | 70.5 | 54.4 | 43.3 | 35.9 | 7.8 | 14.2 |
| OLMo2-13B | 55.3 | 56.7 | 45.3 | 39.4 | 48.3 | 53.7 |
| Apertus-8B | 76.1 | 74.4 | 50.2 | 48.3 | 39.6 | 42.1 |
Citation
@misc{samuel2025fluentalignmentdisfluentjudges,
title={Fluent Alignment with Disfluent Judges: Post-training for Lower-resource Languages},
author={David Samuel and Lilja Øvrelid and Erik Velldal and Andrey Kutuzov},
year={2025},
eprint={2512.08777},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.08777},
}
@inproceedings{samuel-etal-2025-small,
title = "Small Languages, Big Models: {A} Study of Continual Training on Languages of {Norway}",
author = "Samuel, David and
Mikhailov, Vladislav and
Velldal, Erik and
{\O}vrelid, Lilja and
Charpentier, Lucas Georges Gabriel and
Kutuzov, Andrey and
Oepen, Stephan",
editor = "Johansson, Richard and
Stymne, Sara",
booktitle = "Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)",
month = mar,
year = "2025",
address = "Tallinn, Estonia",
publisher = "University of Tartu Library",
url = "https://aclanthology.org/2025.nodalida-1.61/",
pages = "573--608",
ISBN = "978-9908-53-109-0",
}
Contact
Please write a community message or contact David Samuel (davisamu@ifi.uio.no) if you have any questions about this model.
- Downloads last month
- 976
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for norallm/normistral-11b-thinking-gguf
Base model
mistralai/Mistral-Nemo-Base-2407