Instructions to use selorahomes/Selora-AI with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use selorahomes/Selora-AI with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="selorahomes/Selora-AI") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("selorahomes/Selora-AI", dtype="auto") - llama-cpp-python
How to use selorahomes/Selora-AI with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="selorahomes/Selora-AI", filename="qwen3_17b_base.f16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use selorahomes/Selora-AI with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf selorahomes/Selora-AI:F16 # Run inference directly in the terminal: llama-cli -hf selorahomes/Selora-AI:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf selorahomes/Selora-AI:F16 # Run inference directly in the terminal: llama-cli -hf selorahomes/Selora-AI:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf selorahomes/Selora-AI:F16 # Run inference directly in the terminal: ./llama-cli -hf selorahomes/Selora-AI:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf selorahomes/Selora-AI:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf selorahomes/Selora-AI:F16
Use Docker
docker model run hf.co/selorahomes/Selora-AI:F16
- LM Studio
- Jan
- vLLM
How to use selorahomes/Selora-AI with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "selorahomes/Selora-AI" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "selorahomes/Selora-AI", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/selorahomes/Selora-AI:F16
- SGLang
How to use selorahomes/Selora-AI with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "selorahomes/Selora-AI" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "selorahomes/Selora-AI", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "selorahomes/Selora-AI" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "selorahomes/Selora-AI", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use selorahomes/Selora-AI with Ollama:
ollama run hf.co/selorahomes/Selora-AI:F16
- Unsloth Studio new
How to use selorahomes/Selora-AI with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for selorahomes/Selora-AI to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for selorahomes/Selora-AI to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for selorahomes/Selora-AI to start chatting
- Pi new
How to use selorahomes/Selora-AI with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf selorahomes/Selora-AI:F16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "selorahomes/Selora-AI:F16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use selorahomes/Selora-AI with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf selorahomes/Selora-AI:F16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default selorahomes/Selora-AI:F16
Run Hermes
hermes
- Docker Model Runner
How to use selorahomes/Selora-AI with Docker Model Runner:
docker model run hf.co/selorahomes/Selora-AI:F16
- Lemonade
How to use selorahomes/Selora-AI with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull selorahomes/Selora-AI:F16
Run and chat with the model
lemonade run user.Selora-AI-F16
List all available models
lemonade list
Selora AI
Qwen3 1.7B fine-tuned for Home Assistant with four specialist LoRA
adapters. The answer adapter additionally emits a query_state tool
envelope for live device-state queries against the Home Assistant REST
API. Used by the Selora AI Home Assistant
integration;
also runnable directly via Ollama, llama.cpp, or vLLM.
Specialists
| Adapter | Intent | Output shape |
|---|---|---|
command |
"Turn off the kitchen lights" | {intent:"command",response,calls:[โฆ]} |
automation |
"Wake up lights at 6:30 AM" | {intent:"automation",automation:{triggers,actions,โฆ}} |
answer |
Q&A / small talk | {intent:"answer",response} |
clarification |
Ask the user a follow-up | {intent:"clarification",response} |
The HA integration's selora_local provider classifies each request to
one of the four specialists before the call (cheap regex
pre-classifier), then sends the request with model: selora-v1-{specialist}. Backends that support multi-LoRA
(llama-server's /lora-adapters, vLLM --enable-lora) activate the
matching adapter.
Quick start
Ollama
ollama pull selora/commands
ollama run selora/commands
Modelfiles for all four specialists live in ollama/ and
are also published as separate Ollama models.
llama.cpp
llama-server \
--model qwen3_17b_base.Q4_K_M.gguf \
--lora-init-without-apply \
--lora qwen3_17b_command.lora.gguf \
--lora qwen3_17b_automation.lora.gguf \
--lora qwen3_17b_answer.lora.gguf \
--lora qwen3_17b_clarification.lora.gguf \
--ctx-size 8192
POST to /lora-adapters to switch the active LoRA before each
/v1/chat/completions call.
vLLM (cloud)
python -m vllm.entrypoints.openai.api_server \
--model ./qwen3_17b_hf \
--enable-lora --max-loras 4 --max-lora-rank 32 \
--lora-modules \
selora-v1-commands=/path/to/peft/command \
selora-v1-automations=/path/to/peft/automation \
selora-v1-answers=/path/to/peft/answer \
selora-v1-clarifications=/path/to/peft/clarification
vLLM activates the matching LoRA based on the request's model field;
no extra routing layer needed.
Generation parameters
{
"temperature": 0.0,
"repeat_penalty": 1.15,
"repeat_last_n": 256,
"max_tokens": 384,
"stop": ["<|im_end|>", "<|endoftext|>"]
}
Bump max_tokens to 1536 for automation requests (longer JSON output).
Training
Base: Qwen3 1.7B fine-tuned
with Apple mlx-lm. Each
specialist has its own LoRA (rank 8โ28, scale 20) trained on a curated
HA-domain corpus (forum threads, HA docs, synthetic command /
automation pairs). System prompts trained per-specialist; see
prompts/. The answer adapter went through a sequential
continuation pass that added a query_state tool envelope on top of
the original answer-only training distribution; that's preserved in
the augmented prompts/answers.txt and the Modelfile.answers SYSTEM
block.
Evaluation
10/10 parity pass rate on the four-intent suite (command, automation,
answer, clarification โ plus screenshot regressions). Validator and
scenarios live in parity/.
Files in this bundle
| Artifact | Purpose | Distribution |
|---|---|---|
qwen3_17b_base.IQ4_XS.gguf |
Quantized base for Ollama / llama.cpp | Hugging Face, ollama.com |
qwen3_17b_{intent}.lora.gguf (ร4) |
Specialist LoRA adapters | Hugging Face, ollama.com |
Modelfile.{intent} (ร4) |
Ollama recipes (base + LoRA + system prompt) | this repo, ollama.com |
prompts/{intent}.txt (ร4) |
Plain-text trained prompts (reference / testing) | this repo |
The full-precision (f16) base and HF safetensors set used by vLLM / TGI / SageMaker live separately in the cloud bundle and are not yet mirrored to Hugging Face.
Citation
@misc{selora-ai-2026,
title = {Selora AI: Qwen3 1.7B + LoRA Specialists for Home Assistant},
author = {{Selora Homes}},
year = {2026},
url = {https://huggingface.co/selora-homes/selora-ai}
}
Base model citation: Qwen Team, Qwen3 Technical Report (2025).
License
Apache-2.0 (matches the Qwen3 base license).
- Downloads last month
- 338
16-bit