--- license: other language: - en pipeline_tag: text-generation inference: false tags: - transformers - gguf - imatrix - Qwen2.5-Coder-7B --- Quantizations of https://huggingface.co/Qwen/Qwen2.5-Coder-7B ### Inference Clients/UIs * [llama.cpp](https://github.com/ggerganov/llama.cpp) * [KoboldCPP](https://github.com/LostRuins/koboldcpp) * [text-generation-webui](https://github.com/oobabooga/text-generation-webui) * [ollama](https://github.com/ollama/ollama) --- # From original readme Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). For Qwen2.5-Coder, we release three base language models and instruction-tuned language models, 1.5, 7 and 32 (coming soon) billion parameters. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: - Significantly improvements in **code generation**, **code reasoning** and **code fixing**. Base on the strong Qwen2.5, we scale up the training tokens into 5.5 trillion including source code, text-code grounding, Synthetic data, etc. - A more comprehensive foundation for real-world applications such as **Code Agents**. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies. - **Long-context Support** up to 128K tokens. **This repo contains the 7B Qwen2.5-Coder model**, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias - Number of Parameters: 7.61B - Number of Paramaters (Non-Embedding): 6.53B - Number of Layers: 28 - Number of Attention Heads (GQA): 28 for Q and 4 for KV - Context Length: Full 131,072 tokens - Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2.5 for handling long texts. **We do not recommend using base language models for conversations.** Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., or fill in the middle tasks on this model. For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5-coder/), [GitHub](https://github.com/QwenLM/Qwen2.5-Coder), [Documentation](https://qwen.readthedocs.io/en/latest/), [Arxiv](https://arxiv.org/abs/2409.12186). ## Requirements The code of Qwen2.5-Coder has been in the latest Hugging face `transformers` and we advise you to use the latest version of `transformers`. With `transformers<4.37.0`, you will encounter the following error: ``` KeyError: 'qwen2' ``` ### Processing Long Texts The current `config.json` is set for context length up to 32,768 tokens. To handle extensive inputs exceeding 32,768 tokens, we utilize [YaRN](https://arxiv.org/abs/2309.00071), a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts. For supported frameworks, you could add the following to `config.json` to enable YaRN: ```json { ..., "rope_scaling": { "factor": 4.0, "original_max_position_embeddings": 32768, "type": "yarn" } } ```