Instructions to use strifero/first-resort-mlx-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use strifero/first-resort-mlx-4bit with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("strifero/first-resort-mlx-4bit") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use strifero/first-resort-mlx-4bit with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "strifero/first-resort-mlx-4bit"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "strifero/first-resort-mlx-4bit" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use strifero/first-resort-mlx-4bit with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "strifero/first-resort-mlx-4bit"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default strifero/first-resort-mlx-4bit
Run Hermes
hermes
- MLX LM
How to use strifero/first-resort-mlx-4bit with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "strifero/first-resort-mlx-4bit"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "strifero/first-resort-mlx-4bit" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "strifero/first-resort-mlx-4bit", "messages": [ {"role": "user", "content": "Hello"} ] }'
First Resort (4B, MLX 4-bit)
A purpose-built language model for offline survival and first-aid reference, powering the First Resort iOS app (free on the App Store, requires iOS 17 or later). Fine-tuned from Qwen3.5-4B and quantized to 4-bit MLX format for on-device inference on Apple Silicon iPhones and iPads.
TL;DR
This model is the inference engine inside a free iOS app that gives civilians action-first survival and first-aid guidance with no internet connection required. It is not a general-purpose chatbot, not a medical professional, and not a substitute for emergency services. If you can call 911, call 911. This model is for the case where you cannot.
Model details
- Base model: unsloth/Qwen3.5-4B
- Adaptation: LoRA rank 64 with rsLoRA, fine-tuned on 11,053 training examples
- Quantization: 4-bit MLX (this repository); also available as a 4-bit LoRA adapter for PyTorch/transformers
- Languages: English
- License: Apache 2.0 (inherits from Qwen3.5-4B base)
- Intended runtime: Apple Silicon (M1 or later) via MLX on iOS 17+ devices
The model is trained with assistant_only_loss so it learns the answer style without imitating the user's question style, and uses a custom chat template (chat_template.jinja in this repo) that reliably emits <|im_end|> as the end-of-sequence token. This was the primary fix vs earlier iterations that had runaway-generation failures.
Intended use
This model is intended to be used inside the First Resort iOS app. It can also be loaded directly via MLX for research, evaluation, or downstream fine-tuning. The model is designed to answer questions about:
- First aid (bleeding, fractures, burns, CPR, hypothermia, heat exhaustion, allergic reactions, choking)
- Survival in wilderness, desert, marine, mountain, and cold environments
- Improvised tool use and gear substitutions when full equipment is unavailable
- Recognizing danger signs that require evacuation or professional help
- What to do during natural disasters (earthquakes, floods, wildfires, severe weather)
It is not intended for:
- Medication dosing (it is trained to decline these)
- Legal advice
- Long-form conversation, fiction, or creative writing
- Replacing emergency services or professional medical care
How to use
MLX (Apple Silicon, recommended)
from mlx_lm import load, generate
model, tokenizer = load("strifero/first-resort-mlx-4bit")
SYSTEM = (
"You are a pocket survival and first-aid reference for civilians in "
"emergencies. Answer in short, direct sentences. Lead with the action. "
"No hedging. No bureaucratic language. If a question is outside your "
"scope (medication doses, legal advice, real-time data, procedures "
"requiring medical training), say so directly and redirect to the "
"right resource."
)
messages = [
{"role": "system", "content": SYSTEM},
{"role": "user", "content": "My friend just got bit by a rattlesnake on the calf. What do I do?"},
]
prompt = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
print(generate(model, tokenizer, prompt, max_tokens=512, verbose=True))
Chat template
The model expects the standard Qwen ChatML format:
<|im_start|>system
{SYSTEM_PROMPT}<|im_end|>
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant
Use the included chat_template.jinja for best results. The model is trained to terminate at <|im_end|>; verify your inference setup respects that as a stop token.
Recommended generation settings
| Parameter | Value | Notes |
|---|---|---|
max_tokens |
200 to 1000 | Most answers terminate well under 200. 1000 is the format-compliance cap. |
temperature |
0.0 to 0.3 | Deterministic at 0.0; 0.3 if you want slight variation. |
top_p |
1.0 | |
repetition_penalty |
1.0 to 1.1 | 1.1 helps avoid loops on rare prompts. |
do_sample |
False | Greedy decoding is the default. |
Training data
11,053 supervised fine-tuning examples in messages format, drawn from these slices:
| Slice | Records | Source |
|---|---|---|
base |
9,274 | Q&A pairs grounded in passages from military, FAA, NOAA, USCG, ready.gov, WMS, AHA, and first-aid public-domain sources |
adversarial |
1,350 | Hand-curated edge cases and tricky questions across the corpus |
graceful_decline |
168 | Out-of-scope questions (medication doses, legal advice, real-time data) with model trained to refuse and redirect |
short_query |
95 | Short conversational queries to avoid over-formal responses on simple inputs |
filler_v3_1 |
45 | Acknowledgment-style responses ("thanks", "ok") |
snake_anchors |
31 | Hand-written snake bite first-aid records |
hypothermia_anchors |
23 | Hand-written hypothermia first-aid records |
All training records use the same canonical system prompt shown in the "How to use" section above. Validation set is a stratified 10% holdout.
The corpus is derived from public-domain and government-published material (US military field manuals, NOAA weather safety guides, USCG marine survival publications, FAA emergency procedures, ready.gov disaster preparedness, AHA first aid guidelines, Wilderness Medical Society protocols).
Training procedure
| Hyperparameter | Value |
|---|---|
| Base model | unsloth/Qwen3.5-4B (8-bit) |
| LoRA rank | 64 |
| LoRA alpha | 128 |
| LoRA dropout | 0.05 |
| rsLoRA | enabled |
| Target modules | 7 (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj) |
| Trainable parameters | 84.9M (1.84% of base) |
| Epochs | 2 |
| Learning rate | 5e-5 |
| Max sequence length | 1,280 |
| Per-device batch size | 2 |
| Gradient accumulation | 4 |
| Effective batch size | 16 (2 GPUs x 2 x 4) |
| Max grad norm | 1.0 |
| Loss masking | assistant_only_loss |
| Chat template | Custom (chat_template.jinja in repo) |
| Trainer | TRL SFTTrainer via Unsloth |
| Hardware | 2x NVIDIA RTX A4500 (NVLink, DDP) |
| Wall time | 3 hours 5 minutes |
Eval loss bottomed at epoch 2 (0.96) vs epoch 1 (1.10), so the published checkpoint is from end of epoch 2.
After training, the LoRA adapter was merged into the base model and converted to MLX 4-bit format using mlx-lm tooling.
Evaluation
Evaluated on a 272-question held-out test set drawn from the same source categories as the training corpus, with passages and questions never seen during training. Each model response was graded by an LLM judge (Qwen 3.6 35b-a3b at temperature 0) on four axes (each 1 to 5):
- faithful: Is the answer supported by the source passage?
- voice: Does it match the airline-pilot direct, action-first style?
- answerable: Does it answer the question asked?
- safety: Is the advice safe for an untrained civilian?
Each response gets a verdict of keep, fix, or reject based on the per-axis scores.
Aggregate result
| Metric | Value |
|---|---|
| Aggregate score (sum of axes / 20 * 100) | 59.6 / 100 |
| keep | 27 (9.9%) |
| fix | 163 (59.9%) |
| reject | 82 (30.1%) |
Per-axis averages (out of 5)
| Axis | Average |
|---|---|
| faithful | 2.07 |
| voice | 3.68 |
| answerable | 3.26 |
| safety | 2.90 |
Format compliance
A separate format-compliance smoketest of 14 prompts at max_tokens=1000 showed all 14 responses terminating cleanly via <|im_end|> with token counts in the range 42 to 138. This was the primary fix vs earlier iterations.
Notes on the score
The 59.6 aggregate looks low in isolation but the judge model is calibrated harsh. The faithful axis specifically penalizes any answer that does not directly cite the source passage, even when the answer is factually correct from general knowledge. The voice and answerable axes (3.68 and 3.26 of 5) reflect that the model is producing reasonable, on-topic responses in the intended action-first style.
The earlier in-house 33-question hand-curated rubric (used during development for go/no-go decisions) gave this model 85.6 of 100. The 33-question rubric is a different metric and not directly comparable to the 272-question heldout above.
Limitations and safety
This model is a small language model fine-tuned on a survival and first-aid corpus. Like any small language model, it will sometimes:
- Hallucinate plausible-sounding but incorrect details
- Be confidently wrong about specifics (drug doses, exact angles, numeric thresholds)
- Miss context cues that would change the right action
- Produce responses that are correct in isolation but inappropriate for the specific situation
This model is not a substitute for professional medical care, emergency services, or trained survival expertise. If you have phone signal and an emergency, call your local emergency number. The model is intended as a reference when professional help is unavailable or delayed.
The model is trained to decline medication-dose questions and legal-advice questions. If you observe it providing specific drug dosages, treat the output as untrusted and verify against a real medical source.
Children should not use this model unsupervised. The app rates 4+ on the App Store but the underlying subject matter (injury, emergency response, medical situations) is not appropriate for unsupervised use by younger children.
The model has no awareness of real-time information. It cannot see your location, the current weather, your medical history, or the actual condition of the person you are asking about. Treat every output as a starting point for thinking, not as a step-by-step prescription.
Related artifacts
- iOS app: Download First Resort on the App Store (released June 11, 2026, free, requires iOS 17+)
- Marketing site: https://strifetech.com/first-resort/
- Support: https://strifetech.com/first-resort-support/
- Privacy policy: https://strifetech.com/first-resort-privacy-policy/
- GitHub (training code and curation pipeline): https://github.com/strifero/first-resort (public)
The iOS app collects zero data about the user. All inference happens on-device. The privacy policy linked above documents this in detail.
License
Apache 2.0, inheriting from the Qwen3.5-4B base model. You are free to use, modify, and redistribute this model, including for commercial use, subject to the standard Apache 2.0 conditions.
Acknowledgments
- Qwen team for the Qwen3.5 base model
- Unsloth for the training and quantization tooling
- TRL team at Hugging Face for the SFT trainer
- Apple MLX team for the inference framework
- Public-domain content from US military, NOAA, USCG, FAA, ready.gov, AHA, and Wilderness Medical Society publications
Contact
For questions about the model or the First Resort app, email support@strifetech.com.
For bug reports on the training pipeline, open an issue on the GitHub repo.
For commercial licensing inquiries beyond what Apache 2.0 grants, reach out via email.
- Downloads last month
- 359
4-bit
Model tree for strifero/first-resort-mlx-4bit
Evaluation results
- 4-axis judge aggregate (out of 100) on First Resort heldoutself-reported59.600
- Keep verdict rate (%) on First Resort heldoutself-reported9.900
- Reject verdict rate (%) on First Resort heldoutself-reported30.100