Instructions to use Wizcoderr/qwen3-14b-flutter-fused with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Wizcoderr/qwen3-14b-flutter-fused with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("Wizcoderr/qwen3-14b-flutter-fused")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

Pi new

How to use Wizcoderr/qwen3-14b-flutter-fused with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "Wizcoderr/qwen3-14b-flutter-fused"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Wizcoderr/qwen3-14b-flutter-fused"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Wizcoderr/qwen3-14b-flutter-fused with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "Wizcoderr/qwen3-14b-flutter-fused"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Wizcoderr/qwen3-14b-flutter-fused

Run Hermes

hermes

MLX LM

How to use Wizcoderr/qwen3-14b-flutter-fused with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "Wizcoderr/qwen3-14b-flutter-fused"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "Wizcoderr/qwen3-14b-flutter-fused"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "Wizcoderr/qwen3-14b-flutter-fused",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Qwen3 14B Flutter Fused

GenMobiAi presents a fine-tuned and optimized version of Qwen3-14B-Instruct, specialized for Flutter and Dart development with agentic code generation capabilities.

Key Specs:

Parameters: 14B
Architecture: Qwen3ForCausalLM (40 layers)
Context Length: 40,960 tokens
Quantization: 4-bit (group_size=64)
Model Type: qwen3
Data Type: bfloat16
License: Community License

Key Features

Flutter & Dart Development

Widgets - StatelessWidget, StatefulWidget, Material 3 components
State Management - Provider, Riverpod, GetX, BLoC, MobX patterns
Async Programming - Futures, Streams, isolates, async/await
Architecture - MVVM, Clean Architecture, feature-first structures
Testing - Widget tests, unit tests with mockito, integration tests

Package Ecosystem Intelligence

HTTP Clients - Dio, http, chopper with interceptors
Local Storage - Hive, shared_preferences, sqflite, isar
Animations - flutter_animate, lottie, rive integration
UI Libraries - GetX UI, Velocity UI, responsive frameworks
Testing - Golden tests, behavior-driven testing (BDD)

Agentic Capabilities

ChatML Format - Tool-call support for multi-step workflows
Context Preservation - Multi-message conversation handling
JSON Tool Responses - LangGraph-compatible structured outputs
Streaming - Real-time token generation for UI responsiveness

Quick Start Examples

Transformers (CPU/GPU)

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("path/to/qwen3-14b-flutter-fused")
model = AutoModelForCausalLM.from_pretrained(
    "path/to/qwen3-14b-flutter-fused",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {"role": "system", "content": "You are GenMobiAi, an expert Flutter developer assistant."},
    {"role": "user", "content": "Create a responsive Flutter dashboard with dark mode support"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=1024, temperature=0.3, top_p=0.9)
print(tokenizer.decode(output[0], skip_special_tokens=True))

MLX-LM (Apple Silicon - Recommended)

python -m mlx_lm.generate \
  --model path/to/qwen3-14b-flutter-fused \
  --prompt "Write a Flutter Provider pattern for cart management" \
  --max-tokens 1024 \
  --temp 0.3 \
  --top-p 0.9

vLLM (Production Serving)

from vllm import LLM, SamplingParams

llm = LLM("path/to/qwen3-14b-flutter-fused", max_model_len=4096)
outputs = llm.generate(
    ["<|im_start|>user\nCreate a Riverpod async data provider for user authentication<|im_end|>\n"],
    SamplingParams(temperature=0.3, top_p=0.9, max_tokens=1024, repetition_penalty=1.05)
)
print(outputs[0].outputs[0].text)

Recommended Sampling Parameters

Use Case	Temperature	Top-P	Top-K	Repetition Penalty
Code Generation	0.3	0.9	40	1.05
Complex Logic	0.5	0.95	50	1.0
Agentic Workflows	0.2	0.85	40	1.1
Creative Patterns	0.7	0.95	50	0.95
Documentation	0.4	0.92	40	1.0

Hardware Requirements

Hardware	Memory	Inference Speed	Use Case
Apple M2/M3/M4 (MLX)	24GB+	100+ tok/s	Development
Apple M1 (MLX)	16GB+	50-80 tok/s	Development
RTX 4090 (BF16)	24GB	150+ tok/s	Local Production
RTX 3090 (BF16)	24GB	100+ tok/s	Local Production
H100 (batched)	80GB	800+ tok/s	Server Scale
CPU (GGUF Q4)	32GB	10-15 tok/s	Edge Devices

Model Architecture

Hidden Size: 5,120
Intermediate Size: 17,408
Num Layers: 40
Num Attention Heads: 40
Num KV Heads: 8
Max Position Embeddings: 40,960
Head Dimension: 128
Vocab Size: 151,936
Attention Bias: False
RMS Norm Epsilon: 1e-6

Special Tokens

[151643]    → BOS Token (Beginning of Sequence)
[151645]    → EOS Token (End of Sequence)
<|im_start|> → ChatML message start
<|im_end|>   → ChatML message end

Conversion & Deployment

Convert to GGUF (CPU Inference)

python scripts/mlx_to_gguf.py ./models/qwen3-14b-flutter-fused \
  -o qwen3-14b-flutter.gguf \
  -q q4_k_m

GGML Quantization Options

q4_k_m - 4-bit K-means (recommended balance)
q5_0 - 5-bit quantization (higher quality)
q8_0 - 8-bit quantization (best quality, larger size)
q3_k_m - 3-bit K-means (smallest, lower quality)

Use with Ollama

# After GGUF conversion
ollama create qwen3-flutter -f Modelfile
ollama run qwen3-flutter "Create a BLoC pattern for form validation"

Fine-Tuning

LoRA Fine-tuning with MLX

python -m mlx_lm.lora \
  --model path/to/qwen3-14b-flutter-fused \
  --data train_data.jsonl \
  --lora-layers 8 \
  --batch-size 1 \
  --iters 500

With Hugging Face Trainer

from transformers import TrainingArguments, SFTTrainer
from peft import LoraConfig

lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none"
)

training_args = TrainingArguments(
    output_dir="./flutter-finetuned",
    num_train_epochs=3,
    per_device_train_batch_size=1,
    learning_rate=1e-5,
    bf16=True
)

trainer = SFTTrainer(
    model="path/to/qwen3-14b-flutter-fused",
    args=training_args,
    peft_config=lora_config,
    train_dataset=train_dataset
)
trainer.train()

Limitations & Considerations

Training Data Scale - Trained on Flutter-specific examples; may not generalize to non-mobile domains
Quantization Artifacts - 4-bit quantization may introduce minor precision loss
Context Window - 40,960 tokens; optimal for 4K-8K contexts
Version Coverage - Reflects Flutter/Dart features up to training date
Hallucinations - May generate plausible-sounding but incorrect package APIs
Dependencies - Assumes familiarity with popular Flutter ecosystem packages

Troubleshooting

Out of Memory

# Reduce max_new_tokens or use smaller batch_size
output = model.generate(**inputs, max_new_tokens=512)  # Instead of 1024

Slow Generation

Use MLX on Apple Silicon (50-100x faster)
Enable flash-attention if available
Use smaller max_new_tokens
Consider vLLM for batched inference

Quality Issues

Lower temperature (0.2-0.4 for deterministic code)
Increase top_p (0.85-0.95)
Add more context in system prompt
Use multi-turn conversations for refinement

Citation

@misc{genmobiai2025_qwen3,
  title   = {Qwen3 14B Flutter Fused: Fine-tuned for Flutter/Dart Development},
  author  = {GenMobiAi Contributors},
  year    = {2025},
  url     = {https://huggingface.co/Wizcoderr/qwen3-14b-flutter-fused},
  license = {Community License}
}

Resources

Support & Feedback

For issues, questions, or suggestions:

Open an issue on GitHub
Check existing discussions
Share your use cases and results

Made with ❤️ for the Flutter community

Downloads last month: 99

Safetensors

Model size

15B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for Wizcoderr/qwen3-14b-flutter-fused

Base model

Qwen/Qwen3-14B-Base

Finetuned

Qwen/Qwen3-14B

Quantized

(179)

this model