Instructions to use strifero/first-resort-mlx-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use strifero/first-resort-mlx-4bit with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("strifero/first-resort-mlx-4bit")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

How to use strifero/first-resort-mlx-4bit with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "strifero/first-resort-mlx-4bit"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "strifero/first-resort-mlx-4bit"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use strifero/first-resort-mlx-4bit with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "strifero/first-resort-mlx-4bit"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default strifero/first-resort-mlx-4bit

Run Hermes

hermes

MLX LM

How to use strifero/first-resort-mlx-4bit with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "strifero/first-resort-mlx-4bit"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "strifero/first-resort-mlx-4bit"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "strifero/first-resort-mlx-4bit",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

First Resort (4B, MLX 4-bit)

A purpose-built language model for offline survival and first-aid reference, powering the First Resort iOS app (free on the App Store, requires iOS 17 or later). Fine-tuned from Qwen3.5-4B and quantized to 4-bit MLX format for on-device inference on Apple Silicon iPhones and iPads.

TL;DR

This model is the inference engine inside a free iOS app that gives civilians action-first survival and first-aid guidance with no internet connection required. It is not a general-purpose chatbot, not a medical professional, and not a substitute for emergency services. If you can call 911, call 911. This model is for the case where you cannot.

Model details

Base model: unsloth/Qwen3.5-4B
Adaptation: LoRA rank 64 with rsLoRA, fine-tuned on 11,053 training examples
Quantization: 4-bit MLX (this repository); also available as a 4-bit LoRA adapter for PyTorch/transformers
Languages: English
License: Apache 2.0 (inherits from Qwen3.5-4B base)
Intended runtime: Apple Silicon (M1 or later) via MLX on iOS 17+ devices

The model is trained with assistant_only_loss so it learns the answer style without imitating the user's question style, and uses a custom chat template (chat_template.jinja in this repo) that reliably emits <|im_end|> as the end-of-sequence token. This was the primary fix vs earlier iterations that had runaway-generation failures.

Intended use

This model is intended to be used inside the First Resort iOS app. It can also be loaded directly via MLX for research, evaluation, or downstream fine-tuning. The model is designed to answer questions about:

First aid (bleeding, fractures, burns, CPR, hypothermia, heat exhaustion, allergic reactions, choking)
Survival in wilderness, desert, marine, mountain, and cold environments
Improvised tool use and gear substitutions when full equipment is unavailable
Recognizing danger signs that require evacuation or professional help
What to do during natural disasters (earthquakes, floods, wildfires, severe weather)

It is not intended for:

Medication dosing (it is trained to decline these)
Legal advice
Long-form conversation, fiction, or creative writing
Replacing emergency services or professional medical care

How to use

MLX (Apple Silicon, recommended)

from mlx_lm import load, generate

model, tokenizer = load("strifero/first-resort-mlx-4bit")

SYSTEM = (
    "You are a pocket survival and first-aid reference for civilians in "
    "emergencies. Answer in short, direct sentences. Lead with the action. "
    "No hedging. No bureaucratic language. If a question is outside your "
    "scope (medication doses, legal advice, real-time data, procedures "
    "requiring medical training), say so directly and redirect to the "
    "right resource."
)

messages = [
    {"role": "system", "content": SYSTEM},
    {"role": "user", "content": "My friend just got bit by a rattlesnake on the calf. What do I do?"},
]
prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
print(generate(model, tokenizer, prompt, max_tokens=512, verbose=True))

Chat template

The model expects the standard Qwen ChatML format:

<|im_start|>system
{SYSTEM_PROMPT}<|im_end|>
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant

Use the included chat_template.jinja for best results. The model is trained to terminate at <|im_end|>; verify your inference setup respects that as a stop token.

Recommended generation settings

Parameter	Value	Notes
`max_tokens`	200 to 1000	Most answers terminate well under 200. 1000 is the format-compliance cap.
`temperature`	0.0 to 0.3	Deterministic at 0.0; 0.3 if you want slight variation.
`top_p`	1.0
`repetition_penalty`	1.0 to 1.1	1.1 helps avoid loops on rare prompts.
`do_sample`	False	Greedy decoding is the default.

Training data

11,053 supervised fine-tuning examples in messages format, drawn from these slices:

Slice	Records	Source
`base`	9,274	Q&A pairs grounded in passages from military, FAA, NOAA, USCG, ready.gov, WMS, AHA, and first-aid public-domain sources
`adversarial`	1,350	Hand-curated edge cases and tricky questions across the corpus
`graceful_decline`	168	Out-of-scope questions (medication doses, legal advice, real-time data) with model trained to refuse and redirect
`short_query`	95	Short conversational queries to avoid over-formal responses on simple inputs
`filler_v3_1`	45	Acknowledgment-style responses ("thanks", "ok")
`snake_anchors`	31	Hand-written snake bite first-aid records
`hypothermia_anchors`	23	Hand-written hypothermia first-aid records

All training records use the same canonical system prompt shown in the "How to use" section above. Validation set is a stratified 10% holdout.

The corpus is derived from public-domain and government-published material (US military field manuals, NOAA weather safety guides, USCG marine survival publications, FAA emergency procedures, ready.gov disaster preparedness, AHA first aid guidelines, Wilderness Medical Society protocols).

Training procedure

Hyperparameter	Value
Base model	unsloth/Qwen3.5-4B (8-bit)
LoRA rank	64
LoRA alpha	128
LoRA dropout	0.05
rsLoRA	enabled
Target modules	7 (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
Trainable parameters	84.9M (1.84% of base)
Epochs	2
Learning rate	5e-5
Max sequence length	1,280
Per-device batch size	2
Gradient accumulation	4
Effective batch size	16 (2 GPUs x 2 x 4)
Max grad norm	1.0
Loss masking	assistant_only_loss
Chat template	Custom (chat_template.jinja in repo)
Trainer	TRL SFTTrainer via Unsloth
Hardware	2x NVIDIA RTX A4500 (NVLink, DDP)
Wall time	3 hours 5 minutes

Eval loss bottomed at epoch 2 (0.96) vs epoch 1 (1.10), so the published checkpoint is from end of epoch 2.

After training, the LoRA adapter was merged into the base model and converted to MLX 4-bit format using mlx-lm tooling.

Evaluation

Evaluated on a 272-question held-out test set drawn from the same source categories as the training corpus, with passages and questions never seen during training. Each model response was graded by an LLM judge (Qwen 3.6 35b-a3b at temperature 0) on four axes (each 1 to 5):

faithful: Is the answer supported by the source passage?
voice: Does it match the airline-pilot direct, action-first style?
answerable: Does it answer the question asked?
safety: Is the advice safe for an untrained civilian?

Each response gets a verdict of keep, fix, or reject based on the per-axis scores.

Aggregate result

Metric	Value
Aggregate score (sum of axes / 20 * 100)	59.6 / 100
keep	27 (9.9%)
fix	163 (59.9%)
reject	82 (30.1%)

Per-axis averages (out of 5)

Axis	Average
faithful	2.07
voice	3.68
answerable	3.26
safety	2.90

Format compliance

A separate format-compliance smoketest of 14 prompts at max_tokens=1000 showed all 14 responses terminating cleanly via <|im_end|> with token counts in the range 42 to 138. This was the primary fix vs earlier iterations.

Notes on the score

The 59.6 aggregate looks low in isolation but the judge model is calibrated harsh. The faithful axis specifically penalizes any answer that does not directly cite the source passage, even when the answer is factually correct from general knowledge. The voice and answerable axes (3.68 and 3.26 of 5) reflect that the model is producing reasonable, on-topic responses in the intended action-first style.

The earlier in-house 33-question hand-curated rubric (used during development for go/no-go decisions) gave this model 85.6 of 100. The 33-question rubric is a different metric and not directly comparable to the 272-question heldout above.

Limitations and safety

This model is a small language model fine-tuned on a survival and first-aid corpus. Like any small language model, it will sometimes:

Hallucinate plausible-sounding but incorrect details
Be confidently wrong about specifics (drug doses, exact angles, numeric thresholds)
Miss context cues that would change the right action
Produce responses that are correct in isolation but inappropriate for the specific situation

This model is not a substitute for professional medical care, emergency services, or trained survival expertise. If you have phone signal and an emergency, call your local emergency number. The model is intended as a reference when professional help is unavailable or delayed.

The model is trained to decline medication-dose questions and legal-advice questions. If you observe it providing specific drug dosages, treat the output as untrusted and verify against a real medical source.

Children should not use this model unsupervised. The app rates 4+ on the App Store but the underlying subject matter (injury, emergency response, medical situations) is not appropriate for unsupervised use by younger children.

The model has no awareness of real-time information. It cannot see your location, the current weather, your medical history, or the actual condition of the person you are asking about. Treat every output as a starting point for thinking, not as a step-by-step prescription.

Related artifacts

iOS app: Download First Resort on the App Store (released June 11, 2026, free, requires iOS 17+)
Marketing site: https://strifetech.com/first-resort/
Support: https://strifetech.com/first-resort-support/
Privacy policy: https://strifetech.com/first-resort-privacy-policy/
GitHub (training code and curation pipeline): https://github.com/strifero/first-resort (public)

The iOS app collects zero data about the user. All inference happens on-device. The privacy policy linked above documents this in detail.

License

Apache 2.0, inheriting from the Qwen3.5-4B base model. You are free to use, modify, and redistribute this model, including for commercial use, subject to the standard Apache 2.0 conditions.

Acknowledgments

Qwen team for the Qwen3.5 base model
Unsloth for the training and quantization tooling
TRL team at Hugging Face for the SFT trainer
Apple MLX team for the inference framework
Public-domain content from US military, NOAA, USCG, FAA, ready.gov, AHA, and Wilderness Medical Society publications

Contact

For questions about the model or the First Resort app, email support@strifetech.com.

For bug reports on the training pipeline, open an issue on the GitHub repo.

For commercial licensing inquiries beyond what Apache 2.0 grants, reach out via email.

Downloads last month: 359

Safetensors

Model size

0.7B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for strifero/first-resort-mlx-4bit

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Finetuned

unsloth/Qwen3.5-4B

Quantized

(15)

this model

Evaluation results

4-axis judge aggregate (out of 100) on First Resort heldout
self-reported

59.600
Keep verdict rate (%) on First Resort heldout
self-reported

9.900
Reject verdict rate (%) on First Resort heldout
self-reported

30.100