Qwen3 14B Flutter Fused

GenMobiAi presents a fine-tuned and optimized version of Qwen3-14B-Instruct, specialized for Flutter and Dart development with agentic code generation capabilities.

Key Specs:

  • Parameters: 14B
  • Architecture: Qwen3ForCausalLM (40 layers)
  • Context Length: 40,960 tokens
  • Quantization: 4-bit (group_size=64)
  • Model Type: qwen3
  • Data Type: bfloat16
  • License: Community License

Key Features

Flutter & Dart Development

  • Widgets - StatelessWidget, StatefulWidget, Material 3 components
  • State Management - Provider, Riverpod, GetX, BLoC, MobX patterns
  • Async Programming - Futures, Streams, isolates, async/await
  • Architecture - MVVM, Clean Architecture, feature-first structures
  • Testing - Widget tests, unit tests with mockito, integration tests

Package Ecosystem Intelligence

  • HTTP Clients - Dio, http, chopper with interceptors
  • Local Storage - Hive, shared_preferences, sqflite, isar
  • Animations - flutter_animate, lottie, rive integration
  • UI Libraries - GetX UI, Velocity UI, responsive frameworks
  • Testing - Golden tests, behavior-driven testing (BDD)

Agentic Capabilities

  • ChatML Format - Tool-call support for multi-step workflows
  • Context Preservation - Multi-message conversation handling
  • JSON Tool Responses - LangGraph-compatible structured outputs
  • Streaming - Real-time token generation for UI responsiveness

Quick Start Examples

Transformers (CPU/GPU)

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("path/to/qwen3-14b-flutter-fused")
model = AutoModelForCausalLM.from_pretrained(
    "path/to/qwen3-14b-flutter-fused",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {"role": "system", "content": "You are GenMobiAi, an expert Flutter developer assistant."},
    {"role": "user", "content": "Create a responsive Flutter dashboard with dark mode support"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=1024, temperature=0.3, top_p=0.9)
print(tokenizer.decode(output[0], skip_special_tokens=True))

MLX-LM (Apple Silicon - Recommended)

python -m mlx_lm.generate \
  --model path/to/qwen3-14b-flutter-fused \
  --prompt "Write a Flutter Provider pattern for cart management" \
  --max-tokens 1024 \
  --temp 0.3 \
  --top-p 0.9

vLLM (Production Serving)

from vllm import LLM, SamplingParams

llm = LLM("path/to/qwen3-14b-flutter-fused", max_model_len=4096)
outputs = llm.generate(
    ["<|im_start|>user\nCreate a Riverpod async data provider for user authentication<|im_end|>\n"],
    SamplingParams(temperature=0.3, top_p=0.9, max_tokens=1024, repetition_penalty=1.05)
)
print(outputs[0].outputs[0].text)

Recommended Sampling Parameters

Use Case Temperature Top-P Top-K Repetition Penalty
Code Generation 0.3 0.9 40 1.05
Complex Logic 0.5 0.95 50 1.0
Agentic Workflows 0.2 0.85 40 1.1
Creative Patterns 0.7 0.95 50 0.95
Documentation 0.4 0.92 40 1.0

Hardware Requirements

Hardware Memory Inference Speed Use Case
Apple M2/M3/M4 (MLX) 24GB+ 100+ tok/s Development
Apple M1 (MLX) 16GB+ 50-80 tok/s Development
RTX 4090 (BF16) 24GB 150+ tok/s Local Production
RTX 3090 (BF16) 24GB 100+ tok/s Local Production
H100 (batched) 80GB 800+ tok/s Server Scale
CPU (GGUF Q4) 32GB 10-15 tok/s Edge Devices

Model Architecture

  • Hidden Size: 5,120
  • Intermediate Size: 17,408
  • Num Layers: 40
  • Num Attention Heads: 40
  • Num KV Heads: 8
  • Max Position Embeddings: 40,960
  • Head Dimension: 128
  • Vocab Size: 151,936
  • Attention Bias: False
  • RMS Norm Epsilon: 1e-6

Special Tokens

[151643]    → BOS Token (Beginning of Sequence)
[151645]    → EOS Token (End of Sequence)
<|im_start|> → ChatML message start
<|im_end|>   → ChatML message end

Conversion & Deployment

Convert to GGUF (CPU Inference)

python scripts/mlx_to_gguf.py ./models/qwen3-14b-flutter-fused \
  -o qwen3-14b-flutter.gguf \
  -q q4_k_m

GGML Quantization Options

  • q4_k_m - 4-bit K-means (recommended balance)
  • q5_0 - 5-bit quantization (higher quality)
  • q8_0 - 8-bit quantization (best quality, larger size)
  • q3_k_m - 3-bit K-means (smallest, lower quality)

Use with Ollama

# After GGUF conversion
ollama create qwen3-flutter -f Modelfile
ollama run qwen3-flutter "Create a BLoC pattern for form validation"

Fine-Tuning

LoRA Fine-tuning with MLX

python -m mlx_lm.lora \
  --model path/to/qwen3-14b-flutter-fused \
  --data train_data.jsonl \
  --lora-layers 8 \
  --batch-size 1 \
  --iters 500

With Hugging Face Trainer

from transformers import TrainingArguments, SFTTrainer
from peft import LoraConfig

lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none"
)

training_args = TrainingArguments(
    output_dir="./flutter-finetuned",
    num_train_epochs=3,
    per_device_train_batch_size=1,
    learning_rate=1e-5,
    bf16=True
)

trainer = SFTTrainer(
    model="path/to/qwen3-14b-flutter-fused",
    args=training_args,
    peft_config=lora_config,
    train_dataset=train_dataset
)
trainer.train()

Limitations & Considerations

  1. Training Data Scale - Trained on Flutter-specific examples; may not generalize to non-mobile domains
  2. Quantization Artifacts - 4-bit quantization may introduce minor precision loss
  3. Context Window - 40,960 tokens; optimal for 4K-8K contexts
  4. Version Coverage - Reflects Flutter/Dart features up to training date
  5. Hallucinations - May generate plausible-sounding but incorrect package APIs
  6. Dependencies - Assumes familiarity with popular Flutter ecosystem packages

Troubleshooting

Out of Memory

# Reduce max_new_tokens or use smaller batch_size
output = model.generate(**inputs, max_new_tokens=512)  # Instead of 1024

Slow Generation

  • Use MLX on Apple Silicon (50-100x faster)
  • Enable flash-attention if available
  • Use smaller max_new_tokens
  • Consider vLLM for batched inference

Quality Issues

  • Lower temperature (0.2-0.4 for deterministic code)
  • Increase top_p (0.85-0.95)
  • Add more context in system prompt
  • Use multi-turn conversations for refinement

Citation

@misc{genmobiai2025_qwen3,
  title   = {Qwen3 14B Flutter Fused: Fine-tuned for Flutter/Dart Development},
  author  = {GenMobiAi Contributors},
  year    = {2025},
  url     = {https://huggingface.co/Wizcoderr/qwen3-14b-flutter-fused},
  license = {Community License}
}

Resources

Support & Feedback

For issues, questions, or suggestions:

  • Open an issue on GitHub
  • Check existing discussions
  • Share your use cases and results

Made with ❤️ for the Flutter community

Downloads last month
99
Safetensors
Model size
15B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Wizcoderr/qwen3-14b-flutter-fused

Finetuned
Qwen/Qwen3-14B
Quantized
(179)
this model