Instructions to use Bunnana/data-morph-gemma-2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use Bunnana/data-morph-gemma-2b with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("Bunnana/data-morph-gemma-2b") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use Bunnana/data-morph-gemma-2b with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "Bunnana/data-morph-gemma-2b"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Bunnana/data-morph-gemma-2b" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Bunnana/data-morph-gemma-2b with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "Bunnana/data-morph-gemma-2b"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Bunnana/data-morph-gemma-2b
Run Hermes
hermes
- MLX LM
How to use Bunnana/data-morph-gemma-2b with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "Bunnana/data-morph-gemma-2b"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "Bunnana/data-morph-gemma-2b" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Bunnana/data-morph-gemma-2b", "messages": [ {"role": "user", "content": "Hello"} ] }'
Configuration Parsing Warning:In config.json: "num_experts" must be a number
data-morph-gemma-2b
A 2.0 GB local file-format–conversion model: a Gemma‑4 E2B student distilled from Claude Opus to convert between CSV, JSON, and TXT. Fine‑tuned with LoRA, then shrunk by stripping the unused vision/audio towers, pruning the vocabulary (262 k → 16 k), and quantizing to 8‑bit — 5.12 B → 2.05 B params, 9.6 GB → 2.0 GB.
This is not a general chat model. It is trained for one job: given a small metadata envelope describing a file, write a Python script that converts it. It is meant to be driven by the
data-morphpackage, which runs the full pipeline around it.
How it works
Conversion is a five‑stage pipeline; the model never sees the full source file, only a compact metadata envelope (schema, samples, warnings):
[file] → 1. extract envelope → 3. THIS MODEL writes a Python script
→ 4. sandbox runs the script → 5. validate output → [converted file]
The model emits an <analysis>…</analysis> block followed by a <script>…</script>
block. Narrowing the target from "transform a whole file" to "read metadata, write a
script" is what makes a 2 B model viable, and lets the pipeline scale to arbitrary file
sizes while leaving a readable, debuggable artefact (the script).
Intended use
- In scope: CSV↔JSON conversion, JSON flattening, nested‑JSON construction, TXT log → CSV parsing, and schema migration — the five patterns it was distilled on.
- Out of scope: open‑ended chat, formats other than CSV/JSON/TXT, and adversarial or far‑out‑of‑distribution inputs (a small model can be misled; the surrounding pipeline validates output and retries, but does not guarantee success).
Usage
Use via the pip package (recommended)
pip install "data-morph-gemma[mlx]" # Apple Silicon + MLX
from datamorph import convert_file
result = convert_file("contacts.csv", "contacts.json")
print(result.accepted, result.scores, result.output_path)
convert_file runs the full pipeline (envelope → script → sandbox → validate) with a
retry‑on‑error loop, so you get a validated output file, not just raw text. This model
downloads automatically on first use (cached under ~/.cache/huggingface); set
GEMMA_MLX_MODEL only if you want to point at a local copy instead.
Use directly with mlx_lm
from mlx_lm import load, generate
model, tok = load("Bunnana/data-morph-gemma-2b")
# Prompt = the script-generation instructions + the metadata envelope + the task.
# See the data-morph repo (skills/script_generation_teacher.md) for the exact contract;
# the model replies with <analysis>...</analysis><script>...</script>.
This is a text‑only build — load it with mlx_lm, not mlx_vlm.
Training
- Teacher: Claude Opus + an Agent Skill, generating 800 programmatically‑verified training pairs (every pair passed format/schema/loadability/content checks before use).
- Student:
mlx-community/gemma-4-e2b-it-bf16, fine‑tuned with LoRA (mlx_vlm.lora, SFT, train‑on‑completions); the iter‑400 checkpoint was selected on held‑out eval. - Compression (W7): fuse the LoRA adapter → strip the vision + audio towers → prune the 262 k vocabulary to 16 k (the corpus uses ~4.5 k tokens; a tokenizer round‑trip gate guards the cut) → quantize to 8‑bit (group size 64).
Evaluation
Measured through the full pipeline on a 70‑case held‑out test set (content‑disjoint from training), scored on four metrics — Format Validity, Schema Compliance, Loadability, Content Accuracy.
| Setting | Accepted (all 4 pass) | Score | vs. teacher |
|---|---|---|---|
| one‑shot | 56 / 70 | 0.811 | — |
| production (retry ≤ 3) | 67 / 70 | 0.957 | ~96 % |
The student clears the project's ≥ 80 %‑of‑teacher target on every metric.
Model details
- Architecture:
gemma4_text(text‑only), 2.05 B parameters - Quantization: 8‑bit affine, group size 64
- Vocabulary: 16,384 (pruned from 262 k)
- Context: inherits the base model's context length
- Framework: MLX (Apple Silicon)
Limitations & ethics
- A small model: reliable on the five trained conversion patterns; messy but in‑pattern inputs are handled well, far‑out‑of‑distribution ones may fail.
- Hallucination / data‑loss risk is mitigated — not eliminated — by the pipeline's automated format/schema validation and retries.
- Teacher bias from Claude Opus can propagate to the student.
- Converted files may contain personal data; run locally and do not upload user inputs.
License
This model is a derivative of Google's Gemma and is distributed under the
Gemma Terms of Use. By using it you agree to those
terms, which propagate to derivatives. Base model:
mlx-community/gemma-4-e2b-it-bf16.
Links
- Documentation: https://lovemig6334.github.io/data-morph/
- PyPI package:
data-morph-gemma(pip install "data-morph-gemma[mlx]") - Source & training pipeline: github.com/LoveMig6334/data-morph
- Training dataset:
data-morph-conversions
- Downloads last month
- 47
8-bit
Model tree for Bunnana/data-morph-gemma-2b
Base model
mlx-community/gemma-4-e2b-it-bf16