Text Generation
MLX
Safetensors
qwen3
flutter
dart
code-generation
agentic
conversational
4-bit precision
Instructions to use Wizcoderr/qwen3-14b-flutter-fused with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use Wizcoderr/qwen3-14b-flutter-fused with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("Wizcoderr/qwen3-14b-flutter-fused") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use Wizcoderr/qwen3-14b-flutter-fused with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "Wizcoderr/qwen3-14b-flutter-fused"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Wizcoderr/qwen3-14b-flutter-fused" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Wizcoderr/qwen3-14b-flutter-fused with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "Wizcoderr/qwen3-14b-flutter-fused"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Wizcoderr/qwen3-14b-flutter-fused
Run Hermes
hermes
- MLX LM
How to use Wizcoderr/qwen3-14b-flutter-fused with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "Wizcoderr/qwen3-14b-flutter-fused"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "Wizcoderr/qwen3-14b-flutter-fused" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Wizcoderr/qwen3-14b-flutter-fused", "messages": [ {"role": "user", "content": "Hello"} ] }'
Qwen3 14B Flutter Fused
GenMobiAi presents a fine-tuned and optimized version of Qwen3-14B-Instruct, specialized for Flutter and Dart development with agentic code generation capabilities.
Key Specs:
- Parameters: 14B
- Architecture: Qwen3ForCausalLM (40 layers)
- Context Length: 40,960 tokens
- Quantization: 4-bit (group_size=64)
- Model Type: qwen3
- Data Type: bfloat16
- License: Community License
Key Features
Flutter & Dart Development
- Widgets - StatelessWidget, StatefulWidget, Material 3 components
- State Management - Provider, Riverpod, GetX, BLoC, MobX patterns
- Async Programming - Futures, Streams, isolates, async/await
- Architecture - MVVM, Clean Architecture, feature-first structures
- Testing - Widget tests, unit tests with mockito, integration tests
Package Ecosystem Intelligence
- HTTP Clients - Dio, http, chopper with interceptors
- Local Storage - Hive, shared_preferences, sqflite, isar
- Animations - flutter_animate, lottie, rive integration
- UI Libraries - GetX UI, Velocity UI, responsive frameworks
- Testing - Golden tests, behavior-driven testing (BDD)
Agentic Capabilities
- ChatML Format - Tool-call support for multi-step workflows
- Context Preservation - Multi-message conversation handling
- JSON Tool Responses - LangGraph-compatible structured outputs
- Streaming - Real-time token generation for UI responsiveness
Quick Start Examples
Transformers (CPU/GPU)
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("path/to/qwen3-14b-flutter-fused")
model = AutoModelForCausalLM.from_pretrained(
"path/to/qwen3-14b-flutter-fused",
torch_dtype=torch.bfloat16,
device_map="auto"
)
messages = [
{"role": "system", "content": "You are GenMobiAi, an expert Flutter developer assistant."},
{"role": "user", "content": "Create a responsive Flutter dashboard with dark mode support"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=1024, temperature=0.3, top_p=0.9)
print(tokenizer.decode(output[0], skip_special_tokens=True))
MLX-LM (Apple Silicon - Recommended)
python -m mlx_lm.generate \
--model path/to/qwen3-14b-flutter-fused \
--prompt "Write a Flutter Provider pattern for cart management" \
--max-tokens 1024 \
--temp 0.3 \
--top-p 0.9
vLLM (Production Serving)
from vllm import LLM, SamplingParams
llm = LLM("path/to/qwen3-14b-flutter-fused", max_model_len=4096)
outputs = llm.generate(
["<|im_start|>user\nCreate a Riverpod async data provider for user authentication<|im_end|>\n"],
SamplingParams(temperature=0.3, top_p=0.9, max_tokens=1024, repetition_penalty=1.05)
)
print(outputs[0].outputs[0].text)
Recommended Sampling Parameters
| Use Case | Temperature | Top-P | Top-K | Repetition Penalty |
|---|---|---|---|---|
| Code Generation | 0.3 | 0.9 | 40 | 1.05 |
| Complex Logic | 0.5 | 0.95 | 50 | 1.0 |
| Agentic Workflows | 0.2 | 0.85 | 40 | 1.1 |
| Creative Patterns | 0.7 | 0.95 | 50 | 0.95 |
| Documentation | 0.4 | 0.92 | 40 | 1.0 |
Hardware Requirements
| Hardware | Memory | Inference Speed | Use Case |
|---|---|---|---|
| Apple M2/M3/M4 (MLX) | 24GB+ | 100+ tok/s | Development |
| Apple M1 (MLX) | 16GB+ | 50-80 tok/s | Development |
| RTX 4090 (BF16) | 24GB | 150+ tok/s | Local Production |
| RTX 3090 (BF16) | 24GB | 100+ tok/s | Local Production |
| H100 (batched) | 80GB | 800+ tok/s | Server Scale |
| CPU (GGUF Q4) | 32GB | 10-15 tok/s | Edge Devices |
Model Architecture
- Hidden Size: 5,120
- Intermediate Size: 17,408
- Num Layers: 40
- Num Attention Heads: 40
- Num KV Heads: 8
- Max Position Embeddings: 40,960
- Head Dimension: 128
- Vocab Size: 151,936
- Attention Bias: False
- RMS Norm Epsilon: 1e-6
Special Tokens
[151643] → BOS Token (Beginning of Sequence)
[151645] → EOS Token (End of Sequence)
<|im_start|> → ChatML message start
<|im_end|> → ChatML message end
Conversion & Deployment
Convert to GGUF (CPU Inference)
python scripts/mlx_to_gguf.py ./models/qwen3-14b-flutter-fused \
-o qwen3-14b-flutter.gguf \
-q q4_k_m
GGML Quantization Options
q4_k_m- 4-bit K-means (recommended balance)q5_0- 5-bit quantization (higher quality)q8_0- 8-bit quantization (best quality, larger size)q3_k_m- 3-bit K-means (smallest, lower quality)
Use with Ollama
# After GGUF conversion
ollama create qwen3-flutter -f Modelfile
ollama run qwen3-flutter "Create a BLoC pattern for form validation"
Fine-Tuning
LoRA Fine-tuning with MLX
python -m mlx_lm.lora \
--model path/to/qwen3-14b-flutter-fused \
--data train_data.jsonl \
--lora-layers 8 \
--batch-size 1 \
--iters 500
With Hugging Face Trainer
from transformers import TrainingArguments, SFTTrainer
from peft import LoraConfig
lora_config = LoraConfig(
r=8,
lora_alpha=16,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
bias="none"
)
training_args = TrainingArguments(
output_dir="./flutter-finetuned",
num_train_epochs=3,
per_device_train_batch_size=1,
learning_rate=1e-5,
bf16=True
)
trainer = SFTTrainer(
model="path/to/qwen3-14b-flutter-fused",
args=training_args,
peft_config=lora_config,
train_dataset=train_dataset
)
trainer.train()
Limitations & Considerations
- Training Data Scale - Trained on Flutter-specific examples; may not generalize to non-mobile domains
- Quantization Artifacts - 4-bit quantization may introduce minor precision loss
- Context Window - 40,960 tokens; optimal for 4K-8K contexts
- Version Coverage - Reflects Flutter/Dart features up to training date
- Hallucinations - May generate plausible-sounding but incorrect package APIs
- Dependencies - Assumes familiarity with popular Flutter ecosystem packages
Troubleshooting
Out of Memory
# Reduce max_new_tokens or use smaller batch_size
output = model.generate(**inputs, max_new_tokens=512) # Instead of 1024
Slow Generation
- Use MLX on Apple Silicon (50-100x faster)
- Enable flash-attention if available
- Use smaller max_new_tokens
- Consider vLLM for batched inference
Quality Issues
- Lower temperature (0.2-0.4 for deterministic code)
- Increase top_p (0.85-0.95)
- Add more context in system prompt
- Use multi-turn conversations for refinement
Citation
@misc{genmobiai2025_qwen3,
title = {Qwen3 14B Flutter Fused: Fine-tuned for Flutter/Dart Development},
author = {GenMobiAi Contributors},
year = {2025},
url = {https://huggingface.co/Wizcoderr/qwen3-14b-flutter-fused},
license = {Community License}
}
Resources
Support & Feedback
For issues, questions, or suggestions:
- Open an issue on GitHub
- Check existing discussions
- Share your use cases and results
Made with ❤️ for the Flutter community
- Downloads last month
- 99
Model size
15B params
Tensor type
BF16
·
U32 ·
Hardware compatibility
Log In to add your hardware
4-bit