image

Nexus-Flash-Lite-4B

This model is a lightweight, high-performance fine-tuned version of unsloth/Qwen3.5-4B. It is optimized for efficiency and speed while maintaining strong reasoning capabilities, making it ideal for edge deployment or low-latency applications.

📋 Model Details

  • Developed by: ethanzxv
  • Base Model: unsloth/Qwen3.5-4B
  • License: Apache-2.0
  • Language: English
  • Model Size: 4 Billion parameters

🚀 Training & Optimization

This model was trained 2x faster using Unsloth combined with Hugging Face's TRL library. By leveraging Unsloth's memory-efficient kernels, we achieved significant throughput improvements without sacrificing model quality.

Key Enhancements

  • Memory Efficiency: Designed to run on hardware with limited VRAM.
  • Reasoning-Focused: Fine-tuned to improve logical consistency in shorter responses.
  • Optimized Architecture: Inherits the advanced attention mechanisms of the Qwen3.5 family.

🎯 Intended Use & Capabilities

The Nexus-Flash-Lite-4B is particularly suited for:

  • Fast Inference: Rapid response times for real-time chat and assistant tasks.
  • On-Device AI: Small enough for modern consumer GPUs and high-end mobile devices.
  • Embedded Reasoning: Handling structured data and logical tasks in a compact footprint.

📄 License

This model is released under the Apache-2.0 license. Users should also adhere to the original license terms provided by the Qwen team.

🙏 Acknowledgements

  • Unsloth: For the incredible performance gains in LLM fine-tuning.
  • Hugging Face TRL: For the seamless training integration.
  • Alibaba Cloud: For the robust Qwen3.5-4B base architecture.
Downloads last month
7
Safetensors
Model size
5B params
Tensor type
BF16
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for methil-group/nexus-flash-lite-4B

Finetuned
Qwen/Qwen3.5-4B
Finetuned
(142)
this model
Quantizations
3 models

Collection including methil-group/nexus-flash-lite-4B