Instructions to use sallani/ISO42001-Qwen2.5-0.5B-Edge with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use sallani/ISO42001-Qwen2.5-0.5B-Edge with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="sallani/ISO42001-Qwen2.5-0.5B-Edge") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("sallani/ISO42001-Qwen2.5-0.5B-Edge") model = AutoModelForCausalLM.from_pretrained("sallani/ISO42001-Qwen2.5-0.5B-Edge") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - MLX
How to use sallani/ISO42001-Qwen2.5-0.5B-Edge with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("sallani/ISO42001-Qwen2.5-0.5B-Edge") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - llama-cpp-python
How to use sallani/ISO42001-Qwen2.5-0.5B-Edge with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="sallani/ISO42001-Qwen2.5-0.5B-Edge", filename="iso42001-qwen2.5-0.5b-f16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use sallani/ISO42001-Qwen2.5-0.5B-Edge with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf sallani/ISO42001-Qwen2.5-0.5B-Edge:F16 # Run inference directly in the terminal: llama-cli -hf sallani/ISO42001-Qwen2.5-0.5B-Edge:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf sallani/ISO42001-Qwen2.5-0.5B-Edge:F16 # Run inference directly in the terminal: llama-cli -hf sallani/ISO42001-Qwen2.5-0.5B-Edge:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf sallani/ISO42001-Qwen2.5-0.5B-Edge:F16 # Run inference directly in the terminal: ./llama-cli -hf sallani/ISO42001-Qwen2.5-0.5B-Edge:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf sallani/ISO42001-Qwen2.5-0.5B-Edge:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf sallani/ISO42001-Qwen2.5-0.5B-Edge:F16
Use Docker
docker model run hf.co/sallani/ISO42001-Qwen2.5-0.5B-Edge:F16
- LM Studio
- Jan
- vLLM
How to use sallani/ISO42001-Qwen2.5-0.5B-Edge with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "sallani/ISO42001-Qwen2.5-0.5B-Edge" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sallani/ISO42001-Qwen2.5-0.5B-Edge", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/sallani/ISO42001-Qwen2.5-0.5B-Edge:F16
- SGLang
How to use sallani/ISO42001-Qwen2.5-0.5B-Edge with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "sallani/ISO42001-Qwen2.5-0.5B-Edge" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sallani/ISO42001-Qwen2.5-0.5B-Edge", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "sallani/ISO42001-Qwen2.5-0.5B-Edge" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sallani/ISO42001-Qwen2.5-0.5B-Edge", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use sallani/ISO42001-Qwen2.5-0.5B-Edge with Ollama:
ollama run hf.co/sallani/ISO42001-Qwen2.5-0.5B-Edge:F16
- Unsloth Studio
How to use sallani/ISO42001-Qwen2.5-0.5B-Edge with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for sallani/ISO42001-Qwen2.5-0.5B-Edge to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for sallani/ISO42001-Qwen2.5-0.5B-Edge to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for sallani/ISO42001-Qwen2.5-0.5B-Edge to start chatting
- Pi
How to use sallani/ISO42001-Qwen2.5-0.5B-Edge with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "sallani/ISO42001-Qwen2.5-0.5B-Edge"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "sallani/ISO42001-Qwen2.5-0.5B-Edge" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use sallani/ISO42001-Qwen2.5-0.5B-Edge with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "sallani/ISO42001-Qwen2.5-0.5B-Edge"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default sallani/ISO42001-Qwen2.5-0.5B-Edge
Run Hermes
hermes
- MLX LM
How to use sallani/ISO42001-Qwen2.5-0.5B-Edge with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "sallani/ISO42001-Qwen2.5-0.5B-Edge"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "sallani/ISO42001-Qwen2.5-0.5B-Edge" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sallani/ISO42001-Qwen2.5-0.5B-Edge", "messages": [ {"role": "user", "content": "Hello"} ] }' - Docker Model Runner
How to use sallani/ISO42001-Qwen2.5-0.5B-Edge with Docker Model Runner:
docker model run hf.co/sallani/ISO42001-Qwen2.5-0.5B-Edge:F16
- Lemonade
How to use sallani/ISO42001-Qwen2.5-0.5B-Edge with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull sallani/ISO42001-Qwen2.5-0.5B-Edge:F16
Run and chat with the model
lemonade run user.ISO42001-Qwen2.5-0.5B-Edge-F16
List all available models
lemonade list
ISO42001-Qwen2.5-0.5B-Edge
Specialized SLM for ISO/IEC 42001:2023 — AI Management System.
Fine-tuned on Qwen2.5-0.5B-Instruct. Runs fully on-premise, offline, with no external dependencies.
Overview
| Base model | Qwen/Qwen2.5-0.5B-Instruct (Apache 2.0) |
| Architecture | Qwen2 — 24 layers · 896 hidden dim · 14 heads · 0.5B parameters |
| Fine-tuning | MLX LoRA (Apple Silicon) · QLoRA 4-bit NF4 (GPU) |
| Domain | ISO/IEC 42001:2023 · EU AI Act · GDPR × AI · AI Governance |
| Languages | French · English |
| Deployment | On-premise · Offline · Ollama · llama.cpp · LM Studio |
What is this model for?
ISO/IEC 42001:2023 is the first international standard for AI Management Systems (AIMS). It provides organizations that develop, deploy, or use AI with a governance framework to demonstrate responsible and ethical AI use — increasingly required in the context of the EU AI Act.
This model gives CISOs, DPOs, CAIOs, and GRC consultants precise, clause-referenced answers on:
- Clauses 4–10 — context, leadership, planning, support, operations, performance evaluation, improvement
- Annex A — all controls: A.2 policies · A.6 AI system operation · A.7 transparency · A.8 data governance · A.10 supply chain
- EU AI Act × ISO 42001 mapping — 4 risk levels, obligations per category
- ISO 27001 × ISO 42001 × GDPR integration — unified governance approach
- Practical topics — impact assessment, model cards, SoA, AI system register, privacy risk
Example queries
What is the scope of ISO/IEC 42001:2023?
How is Annex A of ISO 42001 structured?
How to conduct an AI Impact Assessment per control A.6.1?
What are the human oversight requirements under ISO 42001 (A.6.2)?
How does ISO 42001 map to EU AI Act Article 9?
What data governance controls does ISO 42001 require for AI systems (A.8)?
Qu'est-ce qu'un Statement of Applicability dans ISO 42001 ?
Comment certifier un AIMS ISO 42001 ? Quelles sont les étapes ?
Quelle est la différence entre ISO 27001 et ISO 42001 ?
Comment créer un registre des systèmes d'IA conforme à ISO 42001 ?
Inference
HuggingFace Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "sallani/ISO42001-Qwen2.5-0.5B-Edge"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{
"role": "system",
"content": (
"You are an expert assistant in AI governance and management systems, "
"specializing in ISO/IEC 42001:2023 (AI Management System), the EU AI Act, "
"and GDPR applied to AI. Your answers are precise, clause-referenced, "
"and tailored to compliance professionals (CISOs, DPOs, CAIOs, GRC consultants)."
)
},
{
"role": "user",
"content": "What are the key controls in ISO 42001 Annex A for AI system operations?"
}
]
text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.1,
top_p=0.9,
repetition_penalty=1.1,
do_sample=True,
)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
Ollama (GGUF Q4_K_M)
ollama create iso42001-edge -f Modelfile
ollama run iso42001-edge "How to conduct an AI Impact Assessment per ISO 42001 A.6.1?"
llama.cpp
./llama-cli \
-m iso42001-qwen2.5-0.5b-q4_k_m.gguf \
--system-prompt "You are an ISO/IEC 42001:2023 AI governance expert." \
-p "What is the scope of ISO 42001?" \
-n 512 --temp 0.1
Training details
Dataset
47 instruction-following Q&A pairs (FR/EN) covering the full standard:
| File | Examples | Split |
|---|---|---|
iso42001_train.jsonl |
37 | Training |
iso42001_test.jsonl |
10 | Evaluation (out-of-distribution) |
Thematic coverage:
- Clauses 4–6: Context · Leadership · Planning · AI Impact Assessment · Risk assessment
- Clauses 7–8: Support · Operations · AI lifecycle · Data governance (A.8) · Human oversight (A.6.2)
- Clauses 9–10: Performance evaluation · Internal audit · Continual improvement
- EU AI Act × ISO 42001: full 4-level risk mapping
- ISO 27001 × ISO 42001 × GDPR integration
- Practical topics: SoA · AI system register · model card · certification steps
Hyperparameters
| Parameter | Value |
|---|---|
| Technique | MLX LoRA (Apple M-series) |
| LoRA rank | 8 |
| LoRA layers | 4 |
| Iterations | 100 |
| Batch size | 8 |
| Learning rate | 5e-5 |
| Max seq length | 1024 |
| Optimizer | Adam |
Offline deployment
This model is designed to run fully locally with no network calls at inference time.
- ✅ No data sent to external cloud services
- ✅ CPU-compatible via GGUF Q4_K_M (8 GB RAM minimum)
- ✅ Apple Silicon optimized via MLX
- ✅ Compatible with Ollama · llama.cpp · LM Studio · Jan
- ✅ Apache 2.0 license — commercial use permitted
- ✅ Fully reproducible fine-tuning from source
Limitations
- Compact dataset (47 pairs) — suited for specialized Q&A and evaluation, not production-critical use without further enrichment
- 0.5B model — limited on complex multi-step reasoning chains
- Does not replace a certified ISO 42001 audit conducted by a qualified professional
- Outputs should be reviewed by a subject matter expert before any regulatory decision
License
This model is released under Apache 2.0.
Base model: Qwen2.5-0.5B-Instruct — Apache 2.0, Alibaba Cloud.
- Downloads last month
- 58
Quantized