Instructions to use OpenMed/laneformer-2b-it-q4-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use OpenMed/laneformer-2b-it-q4-mlx with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("OpenMed/laneformer-2b-it-q4-mlx")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

How to use OpenMed/laneformer-2b-it-q4-mlx with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "OpenMed/laneformer-2b-it-q4-mlx"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "OpenMed/laneformer-2b-it-q4-mlx"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use OpenMed/laneformer-2b-it-q4-mlx with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "OpenMed/laneformer-2b-it-q4-mlx"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default OpenMed/laneformer-2b-it-q4-mlx

Run Hermes

hermes

MLX LM

How to use OpenMed/laneformer-2b-it-q4-mlx with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "OpenMed/laneformer-2b-it-q4-mlx"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "OpenMed/laneformer-2b-it-q4-mlx"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "OpenMed/laneformer-2b-it-q4-mlx",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Laneformer 2B Instruct q4 MLX for OpenMed

This private repository contains an OpenMed MLX-LM conversion of kogai/laneformer-2b-it. It is packaged for local Apple Silicon text generation through OpenMed's Python MLX interface and mlx-lm.

At a Glance

Field	Value
Source model	`kogai/laneformer-2b-it`
MLX repo	`OpenMed/laneformer-2b-it-q4-mlx`
Task	Text generation
Runtime	Python `openmed[mlx]` / `mlx-lm`
Quantization	4-bit affine, group size 64
Parameters	2.32B
Source revision	`b4f40adc413c2c5268ab89cf666ade37148d8d4b`
License	Custom upstream license, see source license link

OpenMed MLX Status

Python MLX: supported through openmed.generate_text(...) and openmed.mlx.OpenMedMLXLanguageModel.
Swift MLX: not supported for this causal language model artifact. Swift OpenMedKit MLX currently targets OpenMed token-classification artifacts.
Privacy posture: this artifact is intended for local inference. Do not send protected health information to hosted demos or external services.
Safety posture: OpenMed does not treat this model as a medical device and does not auto-trigger clinical decisions.

Use This MLX Snapshot

hf download OpenMed/laneformer-2b-it-q4-mlx \
  --local-dir ./laneformer-2b-it-q4-mlx

Python Quick Start

pip install "openmed[mlx]"

from openmed import generate_text

response = generate_text(
    messages=[
        {
            "role": "user",
            "content": "Explain why local clinical language models matter.",
        }
    ],
    model_name="OpenMed/laneformer-2b-it-q4-mlx",
    max_tokens=128,
)
print(response)

Use OpenMed/laneformer-2b-it-q4-mlx when you want this preconverted MLX artifact explicitly. OpenMed also accepts kogai/laneformer-2b-it and laneformer-2b-it as compatibility aliases that resolve to this private OpenMed artifact.

Use This Preconverted MLX Repo Directly

from openmed.mlx import OpenMedMLXLanguageModel

runner = OpenMedMLXLanguageModel("./laneformer-2b-it-q4-mlx")
print(runner.generate("Define delayed tensor parallelism.", max_tokens=128))

You can also load this directory directly with mlx_lm.load(...).

Artifact Notes

Format: MLX-LM model directory.
Weights: model.safetensors.
Custom model implementation: laneformer.py, referenced by config.json through model_file.
Tokenizer assets: tokenizer.json, tokenizer_config.json, special_tokens_map.json, and chat_template.jinja.
Quantization metadata is stored in config.json as 4-bit affine with group size 64.

CPU vs MLX Smoke Test

The private export verification used a 13-token prompt on Apple Silicon:

Runtime	Mean prefill	Tokens/sec
PyTorch CPU	0.3112 s	41.77
MLX q4	0.1201 s	108.21

Measured speedup: 2.59x for prefill on the smoke prompt. The CPU and MLX top-5 next-token sets overlapped on 4 of 5 token ids.

Model tree for OpenMed/laneformer-2b-it-q4-mlx

Base model

kogai/laneformer-2b-it

Quantized

(1)

this model

OpenMed
/

laneformer-2b-it-q4-mlx