Laneformer 2B Instruct q4 MLX for OpenMed

This private repository contains an OpenMed MLX-LM conversion of kogai/laneformer-2b-it. It is packaged for local Apple Silicon text generation through OpenMed's Python MLX interface and mlx-lm.

At a Glance

Field Value
Source model kogai/laneformer-2b-it
MLX repo OpenMed/laneformer-2b-it-q4-mlx
Task Text generation
Runtime Python openmed[mlx] / mlx-lm
Quantization 4-bit affine, group size 64
Parameters 2.32B
Source revision b4f40adc413c2c5268ab89cf666ade37148d8d4b
License Custom upstream license, see source license link

OpenMed MLX Status

  • Python MLX: supported through openmed.generate_text(...) and openmed.mlx.OpenMedMLXLanguageModel.
  • Swift MLX: not supported for this causal language model artifact. Swift OpenMedKit MLX currently targets OpenMed token-classification artifacts.
  • Privacy posture: this artifact is intended for local inference. Do not send protected health information to hosted demos or external services.
  • Safety posture: OpenMed does not treat this model as a medical device and does not auto-trigger clinical decisions.

Use This MLX Snapshot

hf download OpenMed/laneformer-2b-it-q4-mlx \
  --local-dir ./laneformer-2b-it-q4-mlx

Python Quick Start

pip install "openmed[mlx]"
from openmed import generate_text

response = generate_text(
    messages=[
        {
            "role": "user",
            "content": "Explain why local clinical language models matter.",
        }
    ],
    model_name="OpenMed/laneformer-2b-it-q4-mlx",
    max_tokens=128,
)
print(response)

Use OpenMed/laneformer-2b-it-q4-mlx when you want this preconverted MLX artifact explicitly. OpenMed also accepts kogai/laneformer-2b-it and laneformer-2b-it as compatibility aliases that resolve to this private OpenMed artifact.

Use This Preconverted MLX Repo Directly

from openmed.mlx import OpenMedMLXLanguageModel

runner = OpenMedMLXLanguageModel("./laneformer-2b-it-q4-mlx")
print(runner.generate("Define delayed tensor parallelism.", max_tokens=128))

You can also load this directory directly with mlx_lm.load(...).

Artifact Notes

  • Format: MLX-LM model directory.
  • Weights: model.safetensors.
  • Custom model implementation: laneformer.py, referenced by config.json through model_file.
  • Tokenizer assets: tokenizer.json, tokenizer_config.json, special_tokens_map.json, and chat_template.jinja.
  • Quantization metadata is stored in config.json as 4-bit affine with group size 64.

CPU vs MLX Smoke Test

The private export verification used a 13-token prompt on Apple Silicon:

Runtime Mean prefill Tokens/sec
PyTorch CPU 0.3112 s 41.77
MLX q4 0.1201 s 108.21

Measured speedup: 2.59x for prefill on the smoke prompt. The CPU and MLX top-5 next-token sets overlapped on 4 of 5 token ids.

Links

Downloads last month
32
Safetensors
Model size
0.4B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OpenMed/laneformer-2b-it-q4-mlx

Quantized
(1)
this model