Instructions to use sallani/ISO42001-Qwen2.5-0.5B-Edge with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sallani/ISO42001-Qwen2.5-0.5B-Edge with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="sallani/ISO42001-Qwen2.5-0.5B-Edge")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("sallani/ISO42001-Qwen2.5-0.5B-Edge")
model = AutoModelForCausalLM.from_pretrained("sallani/ISO42001-Qwen2.5-0.5B-Edge")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

MLX

How to use sallani/ISO42001-Qwen2.5-0.5B-Edge with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("sallani/ISO42001-Qwen2.5-0.5B-Edge")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

llama-cpp-python

How to use sallani/ISO42001-Qwen2.5-0.5B-Edge with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="sallani/ISO42001-Qwen2.5-0.5B-Edge",
	filename="iso42001-qwen2.5-0.5b-f16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use sallani/ISO42001-Qwen2.5-0.5B-Edge with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf sallani/ISO42001-Qwen2.5-0.5B-Edge:F16
# Run inference directly in the terminal:
llama-cli -hf sallani/ISO42001-Qwen2.5-0.5B-Edge:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf sallani/ISO42001-Qwen2.5-0.5B-Edge:F16
# Run inference directly in the terminal:
llama-cli -hf sallani/ISO42001-Qwen2.5-0.5B-Edge:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf sallani/ISO42001-Qwen2.5-0.5B-Edge:F16
# Run inference directly in the terminal:
./llama-cli -hf sallani/ISO42001-Qwen2.5-0.5B-Edge:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf sallani/ISO42001-Qwen2.5-0.5B-Edge:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf sallani/ISO42001-Qwen2.5-0.5B-Edge:F16

Use Docker

docker model run hf.co/sallani/ISO42001-Qwen2.5-0.5B-Edge:F16

LM Studio
Jan

vLLM

How to use sallani/ISO42001-Qwen2.5-0.5B-Edge with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "sallani/ISO42001-Qwen2.5-0.5B-Edge"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sallani/ISO42001-Qwen2.5-0.5B-Edge",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/sallani/ISO42001-Qwen2.5-0.5B-Edge:F16

SGLang

How to use sallani/ISO42001-Qwen2.5-0.5B-Edge with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "sallani/ISO42001-Qwen2.5-0.5B-Edge" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sallani/ISO42001-Qwen2.5-0.5B-Edge",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "sallani/ISO42001-Qwen2.5-0.5B-Edge" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sallani/ISO42001-Qwen2.5-0.5B-Edge",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use sallani/ISO42001-Qwen2.5-0.5B-Edge with Ollama:
```
ollama run hf.co/sallani/ISO42001-Qwen2.5-0.5B-Edge:F16
```

Unsloth Studio

How to use sallani/ISO42001-Qwen2.5-0.5B-Edge with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for sallani/ISO42001-Qwen2.5-0.5B-Edge to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for sallani/ISO42001-Qwen2.5-0.5B-Edge to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for sallani/ISO42001-Qwen2.5-0.5B-Edge to start chatting

How to use sallani/ISO42001-Qwen2.5-0.5B-Edge with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "sallani/ISO42001-Qwen2.5-0.5B-Edge"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "sallani/ISO42001-Qwen2.5-0.5B-Edge"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use sallani/ISO42001-Qwen2.5-0.5B-Edge with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "sallani/ISO42001-Qwen2.5-0.5B-Edge"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default sallani/ISO42001-Qwen2.5-0.5B-Edge

Run Hermes

hermes

MLX LM

How to use sallani/ISO42001-Qwen2.5-0.5B-Edge with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "sallani/ISO42001-Qwen2.5-0.5B-Edge"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "sallani/ISO42001-Qwen2.5-0.5B-Edge"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "sallani/ISO42001-Qwen2.5-0.5B-Edge",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Docker Model Runner
How to use sallani/ISO42001-Qwen2.5-0.5B-Edge with Docker Model Runner:
```
docker model run hf.co/sallani/ISO42001-Qwen2.5-0.5B-Edge:F16
```

Lemonade

How to use sallani/ISO42001-Qwen2.5-0.5B-Edge with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull sallani/ISO42001-Qwen2.5-0.5B-Edge:F16

Run and chat with the model

lemonade run user.ISO42001-Qwen2.5-0.5B-Edge-F16

List all available models

lemonade list

ISO42001-Qwen2.5-0.5B-Edge

Specialized SLM for ISO/IEC 42001:2023 — AI Management System.
Fine-tuned on Qwen2.5-0.5B-Instruct. Runs fully on-premise, offline, with no external dependencies.

Overview


Base model	`Qwen/Qwen2.5-0.5B-Instruct` (Apache 2.0)
Architecture	Qwen2 — 24 layers · 896 hidden dim · 14 heads · 0.5B parameters
Fine-tuning	MLX LoRA (Apple Silicon) · QLoRA 4-bit NF4 (GPU)
Domain	ISO/IEC 42001:2023 · EU AI Act · GDPR × AI · AI Governance
Languages	French · English
Deployment	On-premise · Offline · Ollama · llama.cpp · LM Studio

What is this model for?

ISO/IEC 42001:2023 is the first international standard for AI Management Systems (AIMS). It provides organizations that develop, deploy, or use AI with a governance framework to demonstrate responsible and ethical AI use — increasingly required in the context of the EU AI Act.

This model gives CISOs, DPOs, CAIOs, and GRC consultants precise, clause-referenced answers on:

Clauses 4–10 — context, leadership, planning, support, operations, performance evaluation, improvement
Annex A — all controls: A.2 policies · A.6 AI system operation · A.7 transparency · A.8 data governance · A.10 supply chain
EU AI Act × ISO 42001 mapping — 4 risk levels, obligations per category
ISO 27001 × ISO 42001 × GDPR integration — unified governance approach
Practical topics — impact assessment, model cards, SoA, AI system register, privacy risk

Example queries

What is the scope of ISO/IEC 42001:2023?
How is Annex A of ISO 42001 structured?
How to conduct an AI Impact Assessment per control A.6.1?
What are the human oversight requirements under ISO 42001 (A.6.2)?
How does ISO 42001 map to EU AI Act Article 9?
What data governance controls does ISO 42001 require for AI systems (A.8)?
Qu'est-ce qu'un Statement of Applicability dans ISO 42001 ?
Comment certifier un AIMS ISO 42001 ? Quelles sont les étapes ?
Quelle est la différence entre ISO 27001 et ISO 42001 ?
Comment créer un registre des systèmes d'IA conforme à ISO 42001 ?

Inference

HuggingFace Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "sallani/ISO42001-Qwen2.5-0.5B-Edge"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {
        "role": "system",
        "content": (
            "You are an expert assistant in AI governance and management systems, "
            "specializing in ISO/IEC 42001:2023 (AI Management System), the EU AI Act, "
            "and GDPR applied to AI. Your answers are precise, clause-referenced, "
            "and tailored to compliance professionals (CISOs, DPOs, CAIOs, GRC consultants)."
        )
    },
    {
        "role": "user",
        "content": "What are the key controls in ISO 42001 Annex A for AI system operations?"
    }
]

text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.1,
        top_p=0.9,
        repetition_penalty=1.1,
        do_sample=True,
    )

print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Ollama (GGUF Q4_K_M)

ollama create iso42001-edge -f Modelfile
ollama run iso42001-edge "How to conduct an AI Impact Assessment per ISO 42001 A.6.1?"

llama.cpp

./llama-cli \
  -m iso42001-qwen2.5-0.5b-q4_k_m.gguf \
  --system-prompt "You are an ISO/IEC 42001:2023 AI governance expert." \
  -p "What is the scope of ISO 42001?" \
  -n 512 --temp 0.1

Training details

Dataset

47 instruction-following Q&A pairs (FR/EN) covering the full standard:

File	Examples	Split
`iso42001_train.jsonl`	37	Training
`iso42001_test.jsonl`	10	Evaluation (out-of-distribution)

Thematic coverage:

Clauses 4–6: Context · Leadership · Planning · AI Impact Assessment · Risk assessment
Clauses 7–8: Support · Operations · AI lifecycle · Data governance (A.8) · Human oversight (A.6.2)
Clauses 9–10: Performance evaluation · Internal audit · Continual improvement
EU AI Act × ISO 42001: full 4-level risk mapping
ISO 27001 × ISO 42001 × GDPR integration
Practical topics: SoA · AI system register · model card · certification steps

Hyperparameters

Parameter	Value
Technique	MLX LoRA (Apple M-series)
LoRA rank	8
LoRA layers	4
Iterations	100
Batch size	8
Learning rate	5e-5
Max seq length	1024
Optimizer	Adam

Offline deployment

This model is designed to run fully locally with no network calls at inference time.

✅ No data sent to external cloud services
✅ CPU-compatible via GGUF Q4_K_M (8 GB RAM minimum)
✅ Apple Silicon optimized via MLX
✅ Compatible with Ollama · llama.cpp · LM Studio · Jan
✅ Apache 2.0 license — commercial use permitted
✅ Fully reproducible fine-tuning from source

Limitations

Compact dataset (47 pairs) — suited for specialized Q&A and evaluation, not production-critical use without further enrichment
0.5B model — limited on complex multi-step reasoning chains
Does not replace a certified ISO 42001 audit conducted by a qualified professional
Outputs should be reviewed by a subject matter expert before any regulatory decision

License

This model is released under Apache 2.0.
Base model: Qwen2.5-0.5B-Instruct — Apache 2.0, Alibaba Cloud.

Downloads last month: 58

Safetensors

Model size

0.5B params

Tensor type

BF16

MLX

Hardware compatibility

Quantized

Model tree for sallani/ISO42001-Qwen2.5-0.5B-Edge

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Quantized

(220)

this model

Quantizations

1 model