--- license: llama2 --- # NewHope: Harnessing 99% of GPT-4's Programming Capabilities We introduce NewHope, a fine-tuned chat model based on llama-2-13b, aiming to provide a strong coding capability. NewHope handle different languages including Python, C++, Java, JavaScript, Go, and more. Preliminary evaluation on HumanEval shows that **NewHope possesses 99% of GPT-4's programming capabilities**. **Contact**: SLAM (SUFE Large AI Model) is a research group at Shanghai University of Finance and Economics. cui.wanyun@sufe.edu.cn **TODO**: We will release more evaluatation results and training details later. # Evaluation Results We evaluated NewHope on [HumanEval](https://github.com/openai/human-eval) using the official evaluation script by OpenAI. We compared the Pass@1 metric of NewHope with other models. The results of other models are from PapersWithCode. | Model | Pass@1 | | ----- | ------ | | **GPT-4** | **67.0** | | **NewHope** | **66.5** | | PanGu-Coder2 15B | 61.6 | | WizardCoder 15B | 57.3 | | phi-1 1.3B | 50.6 | | GPT-3.5 | 48.1 | | phi-1-small | 45.0 | | PaLM-Coder | 36.0 | | CodeGeeX2-6B | 35.9 | # Model Weights We have open-sourced the model weights [NewHope](https://huggingface.co/SLAM-group/NewHope). We are uploading the model weights. The weights will be available in a few hours. # Usage To load the NewHope model using Transformers, use the following code: ``` import torch from transformers import LlamaTokenizer, LlamaForCausalLM base_model = "SLAM-group/NewHope" tokenizer = LlamaTokenizer.from_pretrained(base_model) model = LlamaForCausalLM.from_pretrained(base_model, torch_dtype=torch.float16, device_map="auto") # model.config.use_cache is default to `False`. For inference: `model.config.use_cache = True` ``` **Note:** At least Huggingface Transformers **4.31.0** is required to load this model! You can ask NewHope to generate code with instructions. We provide a simple example of how NewHope model generates code with the specific prompt: ``` # Suppose required tokenizer and model have already been loaded instruction = "Write a Python function to tell me what the date is today." prompt = f" ### Instruction:\n{instruction}\n\n### Response:\n" inputs = tokenizer(prompt, add_special_tokens=False, return_tensors="pt").to("cuda") output = model.generate(**inputs, do_sample=True, top_p=0.9, max_new_tokens=2048)[0] decoded_output = tokenizer.decode(output, skip_special_tokens=True).split("### Response:\n")[-1].strip() print(decoded_output) ``` You can also interact with NewHope in a dialog manner with the following prompt: ``` ### Instruction:\nQ1\n\n### Response:\nA1 ### Instruction:\nQ2\n\n### Response:\nA2 ``` # Evaluation ### Local setup 1. Install HumanEval for evaluation. [Details](https://github.com/openai/human-eval) 2. Install dependencies ```bash pip install -r requirements.txt ``` --- For HumanEval, we use the following prompt: ``` example_input = 'def is_odd(number: int) -> bool:\n """ Check whether the given number is odd\n >>> is_odd(3)\n True\n >>> is_odd(6)\n False\n """\n' example_output = 'def is_odd(number: int) -> bool:\n """ Check whether the given number is odd\n >>> is_odd(3)\n True\n >>> is_odd(6)\n False\n """\n return number % 2 == 1' task_in_humaneval = "REPLACE `task_in_humaneval` WITH THE SPECIFIC TASK IN HUMANEVAL DATA" prompt = f" ### Instruction:\nComplete the given function below:\n\n{example_input}\n\n### Response:\n{example_output} ### Instruction:\nComplete the given function below:\n\n{task_in_human_eval}\n\n### Response:\n" ``` To reproduce the results on HumanEval, use the following script: ``` python complete.py --base_model SLAM-group/NewHope --output_dir output --n_gpu 8 ``` The above script will generate `samples.jsonl` in `output_dir`, which can be directly evaluated by HumanEval. [Evaluation procedure](https://github.com/openai/human-eval). We conducted the experiment with `fp16` on 8xA800, 80GB GPUs, reaching `66.5%` on Pass@1 (v.s. GPT4 `67.0%`). # Citation ``` @misc{2023newhope, title={NewHope: Harnessing 99% of GPT-4's Programming Capabilities}, author={Wanyun Cui and Qianle Wang}, howpublished = https://github.com/SLAM-group/newhope, year={2023} } ```