LoupGarou's picture
Update README.md
257f739 verified
|
raw
history blame
13.3 kB
metadata
{}

Model Card for deepseek-coder-6.7b-instruct-pythagora-v3

This model card describes the deepseek-coder-6.7b-instruct-pythagora-v3 model, which is a fine-tuned version of the DeepSeek Coder 6.7B Instruct model, specifically optimized for use with the Pythagora GPT Pilot application.

This updated model includes training for Pythagora GPT Pilot version 0.1.12 prompt changes and more examples to handle the initial application development, initial application specification, and planning.

Model Compatibility

LoupGarou/deepseek-coder-6.7b-instruct-pythagora-v3-gguf, is compatible with the following versions:

GPT-Pilot (version: 0.1.12) and LM Studio (version: 0.2.21)

Please ensure you are using one of the above versions when working with this model to ensure proper functionality and compatibility.

Optimal LLM Settings

Many issues related to empty plans, tasks, circular questions, and poor model performance are related to the following parameters:

  1. Prompt eval batch size (n_batch): LM Studio - Impacts how the instruction is divided and sent to the LLM. To prevent empty tasks, plans, and circular questions, set this to match your Context Length (n_ctx). For example, if your n_ctx = 8192 then set your prompt eval bacth size to match n_batch = 8192. Warning: If the n_batch < n_ctx then your model will give bad results.
  2. Context Length (n_ctx): LM Studio - Sets the maximum length of the instruction and truncates the instruction once the limit is exceeded. Set this value to the maximum your hardware can handle and the maximum for the model. For example, DeepSeek Coder has a maximum token length of 16,384. Warning: GPT Pilot will often create instruction prompts 10,000 to 20,000 tokens in length which is why Pythagora-LLM-Proxy was created to permit toggling to higher capacity APIs such as OpenAI.
  3. System Prompt: LM Studio - System Prompt must be set to DeepSeek Coder prompt: "You are an AI programming assistant, utilizing the DeepSeek Coder model, developed by DeepSeek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer."
  4. MAX_TOKENS (GPT Pilot .env): GPT Pilot - Sets the maximum tokens the OpenAI API generate in the output. Warning: Setting this value too low will result in truncated messages.

Model Details

Model Description

Model Sources

Uses

Direct Use

This model is intended for use with the Pythagora GPT Pilot application, which enables the creation of fully working, production-ready apps with the assistance of a developer. The model has been fine-tuned to work seamlessly with the GPT Pilot prompt structures and can be utilized through the Pythagora LLM Proxy.

The model is designed to generate code and assist with various programming tasks, such as writing features, debugging, and providing code reviews, all within the context of the Pythagora GPT Pilot application.

Out-of-Scope Use

This model should not be used for tasks outside of the intended use case with the Pythagora GPT Pilot application. It is not designed for standalone use or integration with other applications without proper testing and adaptation. Additionally, the model should not be used for generating content related to sensitive topics, such as politics, security, or privacy issues, as it is specifically trained to focus on computer science and programming-related tasks.

Bias, Risks, and Limitations

As with any language model, there may be biases present in the training data that could be reflected in the model's outputs. Users should be aware of potential limitations and biases when using this model. The model's performance may be impacted by the quality and relevance of the input prompts, as well as the specific programming languages and frameworks used in the context of the Pythagora GPT Pilot application.

Recommendations

Users should familiarize themselves with the Pythagora GPT Pilot application and its intended use cases before utilizing this model. It is recommended to use the model in conjunction with the Pythagora LLM Proxy for optimal performance and compatibility. When using the model, users should carefully review and test the generated code to ensure its correctness, efficiency, and adherence to best practices and project requirements.

How to Get Started with the Model

To use this model with the Pythagora GPT Pilot application:

  1. Set up the Pythagora LLM Proxy by following the instructions in the GitHub repository.
  2. Configure the GPT Pilot .env file to use the proxy by setting the OpenAI API endpoint to OPENAI_ENDPOINT=http://localhost:8080/v1/chat/completions and OPENAI_API_KEY=:"not-needed".
  3. Run GPT Pilot as usual, and the proxy will handle the communication between GPT Pilot and the deepseek-coder-6.7b-instruct-pythagora model.
  4. It is possible to run Pythagora directly to LM Studio or any other service with mixed results since these models were not finetuned using a chat format.

For more detailed instructions and examples, please refer to the Pythagora LLM Proxy README.

Training Details

Training Data

The model was fine-tuned using a custom dataset created from sample prompts generated by the Pythagora prompt structures. The prompts are compatible with the version described in the Pythagora README. The dataset was carefully curated to ensure high-quality examples and a diverse range of programming tasks relevant to the Pythagora GPT Pilot application.

Training Procedure

The model was fine-tuned using the training scripts and resources provided in the DeepSeek Coder GitHub repository. Specifically, the finetune/finetune_deepseekcoder.py script was used to perform the fine-tuning process. The model was trained in fp16 precision with a maximum sequence length of 12,288 tokens, utilizing the custom dataset to adapt the base DeepSeek Coder 6.7B Instruct model to the specific requirements and prompt structures of the Pythagora GPT Pilot application.

The training process leveraged state-of-the-art techniques and hardware, including DeepSpeed integration for efficient distributed training, to ensure optimal performance and compatibility with the target application. For detailed information on the training procedure, including the specific hyperparameters and configurations used, please refer to the DeepSeek Coder Fine-tuning Documentation.

Model Examination

No additional interpretability work has been performed on this model. However, the model's performance has been thoroughly tested and validated within the context of the Pythagora GPT Pilot application to ensure its effectiveness in generating high-quality code and assisting with programming tasks.

Environmental Impact

The environmental impact of this model has not been assessed. More information is needed to estimate the carbon emissions and electricity usage associated with the model's training and deployment. As a general recommendation, users should strive to utilize the model efficiently and responsibly to minimize any potential environmental impact.

Technical Specifications

  • Model Architecture: The model architecture is based on the DeepSeek Coder 6.7B Instruct model, which is a transformer-based causal language model optimized for code generation and understanding.
  • Compute Infrastructure: The model was fine-tuned using high-performance computing resources, including GPUs, to ensure efficient and timely training. The exact specifications of the compute infrastructure used for training are not publicly disclosed.

Citation

APA: LoupGarou. (2024). deepseek-coder-6.7b-instruct-pythagora-v2-gguf (Model). https://huggingface.co/LoupGarou/deepseek-coder-6.7b-instruct-pythagora-v3-gguf

Model Card Contact

For questions, feedback, or concerns regarding this model, please contact LoupGarou through the GitHub repository: MoonlightByte/Pythagora-LLM-Proxy. You can open an issue or submit a pull request to discuss any aspects of the model or its usage within the Pythagora GPT Pilot application.

Original model card: DeepSeek's Deepseek Coder 6.7B Instruct

🏠Homepage | 🤖 Chat with DeepSeek Coder | Discord | Wechat(微信)


1. Introduction of Deepseek Coder

Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on project-level code corpus by employing a window size of 16K and a extra fill-in-the-blank task, to support project-level code completion and infilling. For coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks.

  • Massive Training Data: Trained from scratch fon 2T tokens, including 87% code and 13% linguistic data in both English and Chinese languages.
  • Highly Flexible & Scalable: Offered in model sizes of 1.3B, 5.7B, 6.7B, and 33B, enabling users to choose the setup most suitable for their requirements.
  • Superior Model Performance: State-of-the-art performance among publicly available code models on HumanEval, MultiPL-E, MBPP, DS-1000, and APPS benchmarks.
  • Advanced Code Completion Capabilities: A window size of 16K and a fill-in-the-blank task, supporting project-level code completion and infilling tasks.

2. Model Summary

deepseek-coder-6.7b-instruct is a 6.7B parameter model initialized from deepseek-coder-6.7b-base and fine-tuned on 2B tokens of instruction data.

3. How to Use

Here give some examples of how to use our model.

Chat Model Inference

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True).cuda()
messages=[
    { 'role': 'user', 'content': "write a quick sort algorithm in python."}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
# 32021 is the id of <|EOT|> token
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=32021)
print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))

4. License

This code repository is licensed under the MIT License. The use of DeepSeek Coder models is subject to the Model License. DeepSeek Coder supports commercial use.

See the LICENSE-MODEL for more details.

5. Contact

If you have any questions, please raise an issue or contact us at agi_code@deepseek.com.