license: apache-2.0
datasets:
- m-a-p/Code-Feedback
Model Overview
The base model used for training is CallComply/openchat-3.5-0106-128k
, which features a context length of 128k. This model was trained on 31,000 examples from the m-a-p/Code-Feedback
dataset. This dataset aids the model in interactive code performance, enabling it to self-improve with interpreter and human feedback. It is ideal for applications like TaskWeaver, which helps automatically build code, or OpenInterpreter, which assists you in writing code or serves as a general agent.
The reason for choosing this base model is its long context length and strong performance in every category, specifically coding. This model was trained to output up to 8192 tokens, and still inherits its 128k context window from its base model. Making this is the best open source generalized agent model.
Additional Information
During training, we filtered examples that were over 4096 tokens and under 8192 tokens. The end result was 31k examples. The model performed exceptionally well during testing, prompting us to release it, although we are planning on benchmarking this model.
The model is modified and trained back to the standard Mistral EOS token </s>
.
Training consisted of:
- Epochs = 3
- learning_rate = .0002
- per_device_train_batch_size = 4
- gradient_accumulation_steps = 8
- warmup_steps = 10
This model is trained using the Unsloth qlora. Big thank you to Unsloth for releasing their free tier training library, allowing me to train 30k (total 90k with 3 epochs) examples with 8192 tokens of output on a single 3090. Completing training in just 37 hours.
Thank you to the creators of the open feedback dataset, which they decided to open source. Thank you <3.
Benchmarks coming soon, but real world use case for this model is terrific!
Prompt Template
###Human: Write a python script....
###Assistant: python.....
EOS = </s>
.