metadata

language:
  - en
pipeline_tag: text-generation
tags:
  - Pytorch
  - Qwen
  - English
  - code
  - conversational
base_model: NTQAI/Nxcode-CQ-7B-orpo

SandLogic Technologies - Quantized Nxcode-CQ-7B-orpo Models

Model Description

We have quantized the Nxcode-CQ-7B-orpo model into two variants:

Q5_KM
Q4_KM

These quantized models offer improved efficiency while maintaining performance.

Discover our full range of quantized language models by visiting our SandLogic Lexicon GitHub. To learn more about our company and services, check out our website at SandLogic.

Original Model Information

Name: Nxcode-CQ-7B-orpo
Base Model: Qwen/CodeQwen1.5-7B
Fine-tuning Approach: Monolithic Preference Optimization without Reference Model
Fine-tuning Data: 100k samples of high-quality ranking data
Model Type: Transformer-based decoder-only language model
Parameters: 7 billion
Context Length: 64K tokens
Supported Languages: 92 coding languages

Model Capabilities

Nxcode-CQ-7B-orpo is designed for code-related tasks, with strong performance in:

Code generation
Long context understanding and generation
Text-to-SQL conversion
Bug fixing

Performance

Evalplus benchmark results:

HumanEval pass@1: 86.6
HumanEval+ pass@1: 83.5
MBPP (v0.2.0) pass@1: 82.3
MBPP+ (v0.2.0) pass@1: 70.4

Use Cases

Code Generation: Create Python code based on function descriptions or partial implementations
Code Completion: Suggest completions for partially written code
Error Understanding: Potential to help identify and explain coding errors
Programming Education: Provide explanations and examples of coding concepts and patterns

Model Variants

We offer two quantized versions of the Nxcode-CQ-7B-orpo model:

Q5_KM: 5-bit quantization using the KM method
Q4_KM: 4-bit quantization using the KM method

These quantized models aim to reduce model size and improve inference speed while maintaining performance as close to the original model as possible.

Input and Output

Input: Text string (e.g., function descriptions, partial code implementations)
Output: Generated code, completions, or explanations based on the input

Usage

pip install llama-cpp-python

Please refer to the llama-cpp-python documentation to install with GPU support.

Basic Text Completion

Here's an example demonstrating how to use the high-level API for basic text completion:

from llama_cpp import Llama

llm = Llama(
    model_path="./models/7B/Nxcode-CQ-7b.gguf",
    verbose=False,
    # n_gpu_layers=-1, # Uncomment to use GPU acceleration
    # n_ctx=2048, # Uncomment to increase the context window
)

output = llm.create_chat_completion(
    messages = [
        {"role": "system", "content": "You're an AI coding  assistant who help in solving coding questions"},
        {
            "role": "user",
            "content": "Write an python code to find prime number"
        }
    ]
)

print(output["choices"][0]['message']['content'])

Download

You can download Llama models in gguf format directly from Hugging Face using the from_pretrained method. This feature requires the huggingface-hub package.

To install it, run: pip install huggingface-hub

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="SandLogicTechnologies/Nxcode-CQ-7B-orpo-GGUF",
    filename="*Nxcode-CQ-7B-orpo-Q5_K_M.gguf",
    verbose=False
)

By default, from_pretrained will download the model to the Hugging Face cache directory. You can manage installed model files using the huggingface-cli tool.

Acknowledgements

We thank the original developers of Nxcode-CQ-7B-orpo and Qwen/CodeQwen1.5-7B for their contributions to the field.Special thanks to Georgi Gerganov and the entire llama.cpp development team for their outstanding contributions.

Contact

For any inquiries or support, please contact us at support@sandlogic.com or visit our support page.