LoupGarou commited on
Commit
820fb4d
1 Parent(s): bcb391f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -63,7 +63,7 @@ The model was fine-tuned using a custom dataset created from sample prompts gene
63
 
64
  ### Training Procedure
65
 
66
- The model was fine-tuned using the training scripts and resources provided in the [DeepSeek Coder GitHub repository](https://github.com/deepseek-ai/DeepSeek-Coder.git). Specifically, the [finetune/finetune_deepseekcoder.py](https://github.com/deepseek-ai/DeepSeek-Coder/blob/main/finetune/finetune_deepseekcoder.py) script was used to perform the fine-tuning process. The model was trained in fp16 precision with a maximum sequence length of 12,500 tokens, utilizing the custom dataset to adapt the base DeepSeek Coder 6.7B Instruct model to the specific requirements and prompt structures of the Pythagora GPT Pilot application.
67
 
68
  The training process leveraged state-of-the-art techniques and hardware, including DeepSpeed integration for efficient distributed training, to ensure optimal performance and compatibility with the target application. For detailed information on the training procedure, including the specific hyperparameters and configurations used, please refer to the [DeepSeek Coder Fine-tuning Documentation](https://github.com/deepseek-ai/DeepSeek-Coder#how-to-fine-tune-deepseek-coder).
69
 
 
63
 
64
  ### Training Procedure
65
 
66
+ The model was fine-tuned using the training scripts and resources provided in the [DeepSeek Coder GitHub repository](https://github.com/deepseek-ai/DeepSeek-Coder.git). Specifically, the [finetune/finetune_deepseekcoder.py](https://github.com/deepseek-ai/DeepSeek-Coder/blob/main/finetune/finetune_deepseekcoder.py) script was used to perform the fine-tuning process. The model was trained using PEFT with a maximum sequence length of 9,000 tokens, utilizing the custom dataset to adapt the base DeepSeek Coder 33B Instruct model to the specific requirements and prompt structures of the Pythagora GPT Pilot application.
67
 
68
  The training process leveraged state-of-the-art techniques and hardware, including DeepSpeed integration for efficient distributed training, to ensure optimal performance and compatibility with the target application. For detailed information on the training procedure, including the specific hyperparameters and configurations used, please refer to the [DeepSeek Coder Fine-tuning Documentation](https://github.com/deepseek-ai/DeepSeek-Coder#how-to-fine-tune-deepseek-coder).
69