Instructions to use OpenMed/laneformer-2b-it-q4-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use OpenMed/laneformer-2b-it-q4-mlx with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("OpenMed/laneformer-2b-it-q4-mlx") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use OpenMed/laneformer-2b-it-q4-mlx with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "OpenMed/laneformer-2b-it-q4-mlx"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "OpenMed/laneformer-2b-it-q4-mlx" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use OpenMed/laneformer-2b-it-q4-mlx with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "OpenMed/laneformer-2b-it-q4-mlx"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default OpenMed/laneformer-2b-it-q4-mlx
Run Hermes
hermes
- MLX LM
How to use OpenMed/laneformer-2b-it-q4-mlx with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "OpenMed/laneformer-2b-it-q4-mlx"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "OpenMed/laneformer-2b-it-q4-mlx" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OpenMed/laneformer-2b-it-q4-mlx", "messages": [ {"role": "user", "content": "Hello"} ] }'
Laneformer 2B Instruct q4 MLX for OpenMed
This private repository contains an OpenMed MLX-LM conversion of
kogai/laneformer-2b-it.
It is packaged for local Apple Silicon text generation through OpenMed's
Python MLX interface and mlx-lm.
At a Glance
| Field | Value |
|---|---|
| Source model | kogai/laneformer-2b-it |
| MLX repo | OpenMed/laneformer-2b-it-q4-mlx |
| Task | Text generation |
| Runtime | Python openmed[mlx] / mlx-lm |
| Quantization | 4-bit affine, group size 64 |
| Parameters | 2.32B |
| Source revision | b4f40adc413c2c5268ab89cf666ade37148d8d4b |
| License | Custom upstream license, see source license link |
OpenMed MLX Status
- Python MLX: supported through
openmed.generate_text(...)andopenmed.mlx.OpenMedMLXLanguageModel. - Swift MLX: not supported for this causal language model artifact. Swift OpenMedKit MLX currently targets OpenMed token-classification artifacts.
- Privacy posture: this artifact is intended for local inference. Do not send protected health information to hosted demos or external services.
- Safety posture: OpenMed does not treat this model as a medical device and does not auto-trigger clinical decisions.
Use This MLX Snapshot
hf download OpenMed/laneformer-2b-it-q4-mlx \
--local-dir ./laneformer-2b-it-q4-mlx
Python Quick Start
pip install "openmed[mlx]"
from openmed import generate_text
response = generate_text(
messages=[
{
"role": "user",
"content": "Explain why local clinical language models matter.",
}
],
model_name="OpenMed/laneformer-2b-it-q4-mlx",
max_tokens=128,
)
print(response)
Use OpenMed/laneformer-2b-it-q4-mlx when you want this preconverted MLX
artifact explicitly. OpenMed also accepts kogai/laneformer-2b-it and
laneformer-2b-it as compatibility aliases that resolve to this private
OpenMed artifact.
Use This Preconverted MLX Repo Directly
from openmed.mlx import OpenMedMLXLanguageModel
runner = OpenMedMLXLanguageModel("./laneformer-2b-it-q4-mlx")
print(runner.generate("Define delayed tensor parallelism.", max_tokens=128))
You can also load this directory directly with mlx_lm.load(...).
Artifact Notes
- Format: MLX-LM model directory.
- Weights:
model.safetensors. - Custom model implementation:
laneformer.py, referenced byconfig.jsonthroughmodel_file. - Tokenizer assets:
tokenizer.json,tokenizer_config.json,special_tokens_map.json, andchat_template.jinja. - Quantization metadata is stored in
config.jsonas 4-bit affine with group size 64.
CPU vs MLX Smoke Test
The private export verification used a 13-token prompt on Apple Silicon:
| Runtime | Mean prefill | Tokens/sec |
|---|---|---|
| PyTorch CPU | 0.3112 s | 41.77 |
| MLX q4 | 0.1201 s | 108.21 |
Measured speedup: 2.59x for prefill on the smoke prompt. The CPU and MLX top-5 next-token sets overlapped on 4 of 5 token ids.
Links
- Source model: https://huggingface.co/kogai/laneformer-2b-it
- OpenMed: https://github.com/maziyarpanahi/openmed
- OpenMed MLX backend guide: https://github.com/maziyarpanahi/openmed/blob/master/docs/mlx-backend.md
- Downloads last month
- 32
4-bit
Model tree for OpenMed/laneformer-2b-it-q4-mlx
Base model
kogai/laneformer-2b-it