--- pipeline_tag: text-generation inference: true widget: - text: 'Question: Please write a function in Python that performs bubble sort.\n\nAnswer:' example_title: Bubble sort group: Python datasets: - bigcode/commitpackft - bigcode/oasst-octopack metrics: - code_eval library_name: transformers language: - zh - en tags: - codegeex - glm - chatglm model-index: - name: OctoGeeX results: - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalSynthesize Python metrics: - name: pass@1 type: pass@1 value: 44.7 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalSynthesize JavaScript metrics: - name: pass@1 type: pass@1 value: 33.8 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalSynthesize Java metrics: - name: pass@1 type: pass@1 value: 36.9 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalSynthesize Go metrics: - name: pass@1 type: pass@1 value: 21.9 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalSynthesize C++ metrics: - name: pass@1 type: pass@1 value: 32.3 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalSynthesize Rust metrics: - name: pass@1 type: pass@1 value: 25.7 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalSynthesize Average metrics: - name: pass@1 type: pass@1 value: 30.9 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalFix Python metrics: - name: pass@1 type: pass@1 value: 28.1 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalFix JavaScript metrics: - name: pass@1 type: pass@1 value: 27.7 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalFix Java metrics: - name: pass@1 type: pass@1 value: 30.4 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalFix Go metrics: - name: pass@1 type: pass@1 value: 27.6 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalFix C++ metrics: - name: pass@1 type: pass@1 value: 22.9 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalFix Rust metrics: - name: pass@1 type: pass@1 value: 9.6 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalFix Average metrics: - name: pass@1 type: pass@1 value: 24.4 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalExplain Python metrics: - name: pass@1 type: pass@1 value: 30.4 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalExplain JavaScript metrics: - name: pass@1 type: pass@1 value: 24.0 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalExplain Java metrics: - name: pass@1 type: pass@1 value: 24.7 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalExplain Go metrics: - name: pass@1 type: pass@1 value: 21.7 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalExplain C++ metrics: - name: pass@1 type: pass@1 value: 21.0 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalExplain Rust metrics: - name: pass@1 type: pass@1 value: 15.9 verified: false - task: type: text-generation dataset: type: bigcode/humanevalpack name: HumanEvalExplain Average metrics: - name: pass@1 type: pass@1 value: 22.9 verified: false --- ![Octopack](https://github.com/bigcode-project/octopack/blob/31f3320f098703c7910e43492c39366eeea68d83/banner.png?raw=true) # Table of Contents 1. [Model Summary](#model-summary) 2. [Use](#use) 3. [Training](#training) 4. [License](#license) 5. [Citation](#citation) # Model Summary > OctoGeeX is an instruction tuned model with 6B parameters created by fine-tuning [CodeGeeX2](https://huggingface.co/THUDM/codegeex2-6b) on [CommitPackFT](https://huggingface.co/datasets/bigcode/commitpackft) & [OASST](https://huggingface.co/datasets/bigcode/oasst-octopack) as described in the OctoPack paper. - **Repository:** [bigcode-project/octopack](https://github.com/bigcode-project/octopack) - **Paper:** [OctoPack: Instruction Tuning Code Large Language Models](https://arxiv.org/abs/2308.07124) - **Languages:** 100+ Programming languages - **OctoPack🐙🎒:**
Data CommitPack 4TB of GitHub commits across 350 programming languages
CommitPackFT Filtered version of CommitPack for high-quality commit messages that resemble instructions
Model OctoCoder StarCoder (16B parameters) instruction tuned on CommitPackFT + OASST
OctoGeeX CodeGeeX2 (6B parameters) instruction tuned on CommitPackFT + OASST
Evaluation   HumanEvalPack Extension of OpenAI's HumanEval to cover 3 scenarios across 6 languages
# Use ## Intended use The model follows instructions provided in the input. You should always preface your input with "Question: " and finish it with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.\n\nAnswer:" **Feel free to share your generations in the Community tab!** ## Generation ```python # pip install -q transformers from transformers import AutoModelForCausalLM, AutoTokenizer checkpoint = "bigcode/octogeex" device = "cuda" # for GPU usage or "cpu" for CPU usage tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device) inputs = tokenizer.encode("Question: Please write a function in Python that performs bubble sort.\n\nAnswer:", return_tensors="pt").to(device) outputs = model.generate(inputs) print(tokenizer.decode(outputs[0])) ``` # Training ## Model - **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective - **Steps:** 250k pretraining & 30 instruction tuning - **Pretraining tokens:** 1 trillion pretraining & 2M instruction tuning - **Precision:** bfloat16 ## Hardware - **Pretraining:** - **GPUs:** 512 Tesla A100 - **Training time:** 24 days - **Instruction tuning:** - **GPUs:** 8 Tesla A100 - **Training time:** 4 hours ## Software - **Orchestration:** [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training) - **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch) # License 本仓库的代码依照 [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) 协议开源,模型的权重的使用则需要遵循 [Model License](MODEL_LICENSE)。 The code in this repository is open-source under the [MIT license](https://github.com/bigcode-project/octopack/blob/main/LICENSE). The model weights are licensed under the [Model License](MODEL_LICENSE), please apply for commercial use by filling the [registration form](https://open.bigmodel.cn/mla/form?mcode=CodeGeeX2-6B). # Citation ```bibtex @article{muennighoff2023octopack, title={OctoPack: Instruction Tuning Code Large Language Models}, author={Niklas Muennighoff and Qian Liu and Armel Zebaze and Qinkai Zheng and Binyuan Hui and Terry Yue Zhuo and Swayam Singh and Xiangru Tang and Leandro von Werra and Shayne Longpre}, journal={arXiv preprint arXiv:2308.07124}, year={2023} } ```