LoupGarou's picture
Create README.md
dabb864 verified
|
raw
history blame
7.53 kB
---
# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
# Doc / guide: https://huggingface.co/docs/hub/model-cards
{}
---
# Model Card for deepseek-coder-33b-instruct-pythagora
This model card describes the deepseek-coder-33b-instruct-pythagora model, which is a fine-tuned version of the DeepSeek Coder 33B Instruct model, specifically optimized for use with the Pythagora GPT Pilot application.
## Model Details
### Model Description
- **Developed by:** LoupGarou (GitHub: [MoonlightByte](https://github.com/MoonlightByte))
- **Model type:** Causal language model
- **Language(s) (NLP):** English
- **License:** DeepSeek Coder Model License
- **Finetuned from model:** [DeepSeek Coder 33B Instruct](https://huggingface.co/deepseek-ai/deepseek-coder-33b-instruct)
### Model Sources
- **Repository:** [LoupGarou/deepseek-coder-33b-instruct-pythagora-gguf](https://huggingface.co/LoupGarou/deepseek-coder-33b-instruct-pythagora-gguf)
- **GitHub Repository (Proxy Application):** [MoonlightByte/Pythagora-LLM-Proxy](https://github.com/MoonlightByte/Pythagora-LLM-Proxy)
- **Original Model Repository:** [DeepSeek Coder](https://github.com/deepseek-ai/deepseek-coder)
## Uses
### Direct Use
This model is intended for use with the [Pythagora GPT Pilot](https://github.com/Pythagora-io/gpt-pilot) application, which enables the creation of fully working, production-ready apps with the assistance of a developer. The model has been fine-tuned to work seamlessly with the GPT Pilot prompt structures and can be utilized through the [Pythagora LLM Proxy](https://github.com/MoonlightByte/Pythagora-LLM-Proxy).
The model is designed to generate code and assist with various programming tasks, such as writing features, debugging, and providing code reviews, all within the context of the Pythagora GPT Pilot application.
### Out-of-Scope Use
This model should not be used for tasks outside of the intended use case with the Pythagora GPT Pilot application. It is not designed for standalone use or integration with other applications without proper testing and adaptation. Additionally, the model should not be used for generating content related to sensitive topics, such as politics, security, or privacy issues, as it is specifically trained to focus on computer science and programming-related tasks.
## Bias, Risks, and Limitations
As with any language model, there may be biases present in the training data that could be reflected in the model's outputs. Users should be aware of potential limitations and biases when using this model. The model's performance may be impacted by the quality and relevance of the input prompts, as well as the specific programming languages and frameworks used in the context of the Pythagora GPT Pilot application.
### Recommendations
Users should familiarize themselves with the [Pythagora GPT Pilot](https://github.com/Pythagora-io/gpt-pilot) application and its intended use cases before utilizing this model. It is recommended to use the model in conjunction with the [Pythagora LLM Proxy](https://github.com/MoonlightByte/Pythagora-LLM-Proxy) for optimal performance and compatibility. When using the model, users should carefully review and test the generated code to ensure its correctness, efficiency, and adherence to best practices and project requirements.
## How to Get Started with the Model
To use this model with the Pythagora GPT Pilot application:
1. Set up the Pythagora LLM Proxy by following the instructions in the [GitHub repository](https://github.com/MoonlightByte/Pythagora-LLM-Proxy).
2. Configure GPT Pilot to use the proxy by setting the OpenAI API endpoint to `http://localhost:8080/v1/chat/completions`.
3. Run GPT Pilot as usual, and the proxy will handle the communication between GPT Pilot and the deepseek-coder-6.7b-instruct-pythagora model.
4. It is possible to run Pythagora directly to LM Studio or any other service with mixed results since these models were not finetuned using a chat format.
For more detailed instructions and examples, please refer to the [Pythagora LLM Proxy README](https://github.com/MoonlightByte/Pythagora-LLM-Proxy/blob/main/README.md).
## Training Details
### Training Data
The model was fine-tuned using a custom dataset created from sample prompts generated by the Pythagora prompt structures. The prompts are compatible with the version described in the [Pythagora README](https://github.com/Pythagora-io/gpt-pilot/blob/main/README.md). The dataset was carefully curated to ensure high-quality examples and a diverse range of programming tasks relevant to the Pythagora GPT Pilot application.
### Training Procedure
The model was fine-tuned using the training scripts and resources provided in the [DeepSeek Coder GitHub repository](https://github.com/deepseek-ai/DeepSeek-Coder.git). Specifically, the [finetune/finetune_deepseekcoder.py](https://github.com/deepseek-ai/DeepSeek-Coder/blob/main/finetune/finetune_deepseekcoder.py) script was used to perform the fine-tuning process. The model was trained in fp16 precision with a maximum sequence length of 12,500 tokens, utilizing the custom dataset to adapt the base DeepSeek Coder 6.7B Instruct model to the specific requirements and prompt structures of the Pythagora GPT Pilot application.
The training process leveraged state-of-the-art techniques and hardware, including DeepSpeed integration for efficient distributed training, to ensure optimal performance and compatibility with the target application. For detailed information on the training procedure, including the specific hyperparameters and configurations used, please refer to the [DeepSeek Coder Fine-tuning Documentation](https://github.com/deepseek-ai/DeepSeek-Coder#how-to-fine-tune-deepseek-coder).
## Model Examination
No additional interpretability work has been performed on this model. However, the model's performance has been thoroughly tested and validated within the context of the Pythagora GPT Pilot application to ensure its effectiveness in generating high-quality code and assisting with programming tasks.
## Environmental Impact
The environmental impact of this model has not been assessed. More information is needed to estimate the carbon emissions and electricity usage associated with the model's training and deployment. As a general recommendation, users should strive to utilize the model efficiently and responsibly to minimize any potential environmental impact.
## Technical Specifications
- **Model Architecture:** The model architecture is based on the DeepSeek Coder 33B Instruct model, which is a transformer-based causal language model optimized for code generation and understanding.
- **Compute Infrastructure:** The model was fine-tuned using high-performance computing resources, including GPUs, to ensure efficient and timely training. The exact specifications of the compute infrastructure used for training are not publicly disclosed.
## Citation
**APA:**
LoupGarou. (2024). deepseek-coder-33b-instruct-pythagora (Model). https://huggingface.co/LoupGarou/deepseek-coder-33b-instruct-pythagora
## Model Card Contact
For questions, feedback, or concerns regarding this model, please contact LoupGarou through the GitHub repository: [MoonlightByte/Pythagora-LLM-Proxy](https://github.com/MoonlightByte/Pythagora-LLM-Proxy). You can open an issue or submit a pull request to discuss any aspects of the model or its usage within the Pythagora GPT Pilot application.