LoupGarou commited on
Commit
dabb864
1 Parent(s): f92ab85

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +90 -0
README.md ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
3
+ # Doc / guide: https://huggingface.co/docs/hub/model-cards
4
+ {}
5
+ ---
6
+
7
+ # Model Card for deepseek-coder-33b-instruct-pythagora
8
+
9
+ This model card describes the deepseek-coder-33b-instruct-pythagora model, which is a fine-tuned version of the DeepSeek Coder 33B Instruct model, specifically optimized for use with the Pythagora GPT Pilot application.
10
+
11
+ ## Model Details
12
+
13
+ ### Model Description
14
+
15
+ - **Developed by:** LoupGarou (GitHub: [MoonlightByte](https://github.com/MoonlightByte))
16
+ - **Model type:** Causal language model
17
+ - **Language(s) (NLP):** English
18
+ - **License:** DeepSeek Coder Model License
19
+ - **Finetuned from model:** [DeepSeek Coder 33B Instruct](https://huggingface.co/deepseek-ai/deepseek-coder-33b-instruct)
20
+
21
+ ### Model Sources
22
+
23
+ - **Repository:** [LoupGarou/deepseek-coder-33b-instruct-pythagora-gguf](https://huggingface.co/LoupGarou/deepseek-coder-33b-instruct-pythagora-gguf)
24
+ - **GitHub Repository (Proxy Application):** [MoonlightByte/Pythagora-LLM-Proxy](https://github.com/MoonlightByte/Pythagora-LLM-Proxy)
25
+ - **Original Model Repository:** [DeepSeek Coder](https://github.com/deepseek-ai/deepseek-coder)
26
+
27
+ ## Uses
28
+
29
+ ### Direct Use
30
+
31
+ This model is intended for use with the [Pythagora GPT Pilot](https://github.com/Pythagora-io/gpt-pilot) application, which enables the creation of fully working, production-ready apps with the assistance of a developer. The model has been fine-tuned to work seamlessly with the GPT Pilot prompt structures and can be utilized through the [Pythagora LLM Proxy](https://github.com/MoonlightByte/Pythagora-LLM-Proxy).
32
+
33
+ The model is designed to generate code and assist with various programming tasks, such as writing features, debugging, and providing code reviews, all within the context of the Pythagora GPT Pilot application.
34
+
35
+ ### Out-of-Scope Use
36
+
37
+ This model should not be used for tasks outside of the intended use case with the Pythagora GPT Pilot application. It is not designed for standalone use or integration with other applications without proper testing and adaptation. Additionally, the model should not be used for generating content related to sensitive topics, such as politics, security, or privacy issues, as it is specifically trained to focus on computer science and programming-related tasks.
38
+
39
+ ## Bias, Risks, and Limitations
40
+
41
+ As with any language model, there may be biases present in the training data that could be reflected in the model's outputs. Users should be aware of potential limitations and biases when using this model. The model's performance may be impacted by the quality and relevance of the input prompts, as well as the specific programming languages and frameworks used in the context of the Pythagora GPT Pilot application.
42
+
43
+ ### Recommendations
44
+
45
+ Users should familiarize themselves with the [Pythagora GPT Pilot](https://github.com/Pythagora-io/gpt-pilot) application and its intended use cases before utilizing this model. It is recommended to use the model in conjunction with the [Pythagora LLM Proxy](https://github.com/MoonlightByte/Pythagora-LLM-Proxy) for optimal performance and compatibility. When using the model, users should carefully review and test the generated code to ensure its correctness, efficiency, and adherence to best practices and project requirements.
46
+
47
+ ## How to Get Started with the Model
48
+
49
+ To use this model with the Pythagora GPT Pilot application:
50
+
51
+ 1. Set up the Pythagora LLM Proxy by following the instructions in the [GitHub repository](https://github.com/MoonlightByte/Pythagora-LLM-Proxy).
52
+ 2. Configure GPT Pilot to use the proxy by setting the OpenAI API endpoint to `http://localhost:8080/v1/chat/completions`.
53
+ 3. Run GPT Pilot as usual, and the proxy will handle the communication between GPT Pilot and the deepseek-coder-6.7b-instruct-pythagora model.
54
+ 4. It is possible to run Pythagora directly to LM Studio or any other service with mixed results since these models were not finetuned using a chat format.
55
+
56
+ For more detailed instructions and examples, please refer to the [Pythagora LLM Proxy README](https://github.com/MoonlightByte/Pythagora-LLM-Proxy/blob/main/README.md).
57
+
58
+ ## Training Details
59
+
60
+ ### Training Data
61
+
62
+ The model was fine-tuned using a custom dataset created from sample prompts generated by the Pythagora prompt structures. The prompts are compatible with the version described in the [Pythagora README](https://github.com/Pythagora-io/gpt-pilot/blob/main/README.md). The dataset was carefully curated to ensure high-quality examples and a diverse range of programming tasks relevant to the Pythagora GPT Pilot application.
63
+
64
+ ### Training Procedure
65
+
66
+ The model was fine-tuned using the training scripts and resources provided in the [DeepSeek Coder GitHub repository](https://github.com/deepseek-ai/DeepSeek-Coder.git). Specifically, the [finetune/finetune_deepseekcoder.py](https://github.com/deepseek-ai/DeepSeek-Coder/blob/main/finetune/finetune_deepseekcoder.py) script was used to perform the fine-tuning process. The model was trained in fp16 precision with a maximum sequence length of 12,500 tokens, utilizing the custom dataset to adapt the base DeepSeek Coder 6.7B Instruct model to the specific requirements and prompt structures of the Pythagora GPT Pilot application.
67
+
68
+ The training process leveraged state-of-the-art techniques and hardware, including DeepSpeed integration for efficient distributed training, to ensure optimal performance and compatibility with the target application. For detailed information on the training procedure, including the specific hyperparameters and configurations used, please refer to the [DeepSeek Coder Fine-tuning Documentation](https://github.com/deepseek-ai/DeepSeek-Coder#how-to-fine-tune-deepseek-coder).
69
+
70
+ ## Model Examination
71
+
72
+ No additional interpretability work has been performed on this model. However, the model's performance has been thoroughly tested and validated within the context of the Pythagora GPT Pilot application to ensure its effectiveness in generating high-quality code and assisting with programming tasks.
73
+
74
+ ## Environmental Impact
75
+
76
+ The environmental impact of this model has not been assessed. More information is needed to estimate the carbon emissions and electricity usage associated with the model's training and deployment. As a general recommendation, users should strive to utilize the model efficiently and responsibly to minimize any potential environmental impact.
77
+
78
+ ## Technical Specifications
79
+
80
+ - **Model Architecture:** The model architecture is based on the DeepSeek Coder 33B Instruct model, which is a transformer-based causal language model optimized for code generation and understanding.
81
+ - **Compute Infrastructure:** The model was fine-tuned using high-performance computing resources, including GPUs, to ensure efficient and timely training. The exact specifications of the compute infrastructure used for training are not publicly disclosed.
82
+
83
+ ## Citation
84
+
85
+ **APA:**
86
+ LoupGarou. (2024). deepseek-coder-33b-instruct-pythagora (Model). https://huggingface.co/LoupGarou/deepseek-coder-33b-instruct-pythagora
87
+
88
+ ## Model Card Contact
89
+
90
+ For questions, feedback, or concerns regarding this model, please contact LoupGarou through the GitHub repository: [MoonlightByte/Pythagora-LLM-Proxy](https://github.com/MoonlightByte/Pythagora-LLM-Proxy). You can open an issue or submit a pull request to discuss any aspects of the model or its usage within the Pythagora GPT Pilot application.