--- language: - en license: other tags: - chat license_name: tongyi-qianwen license_link: https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/LICENSE pipeline_tag: text-generation model-index: - name: Dracarys-72B-Instruct results: - task: type: text-generation name: Text Generation dataset: name: IFEval (0-Shot) type: HuggingFaceH4/ifeval args: num_few_shot: 0 metrics: - type: inst_level_strict_acc and prompt_level_strict_acc value: 78.56 name: strict accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=abacusai/Dracarys-72B-Instruct name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: BBH (3-Shot) type: BBH args: num_few_shot: 3 metrics: - type: acc_norm value: 56.94 name: normalized accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=abacusai/Dracarys-72B-Instruct name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MATH Lvl 5 (4-Shot) type: hendrycks/competition_math args: num_few_shot: 4 metrics: - type: exact_match value: 33.61 name: exact match source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=abacusai/Dracarys-72B-Instruct name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GPQA (0-shot) type: Idavidrein/gpqa args: num_few_shot: 0 metrics: - type: acc_norm value: 18.79 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=abacusai/Dracarys-72B-Instruct name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MuSR (0-shot) type: TAUR-Lab/MuSR args: num_few_shot: 0 metrics: - type: acc_norm value: 16.81 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=abacusai/Dracarys-72B-Instruct name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU-PRO (5-shot) type: TIGER-Lab/MMLU-Pro config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 49.51 name: accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=abacusai/Dracarys-72B-Instruct name: Open LLM Leaderboard --- # Dracarys-72B-Instruct # Introduction We introduce the latest in the Smaug series, the Dracarys family of finetunes targeting coding performance improvements across a variety of base models. This variant is a finetune of [Qwen2-72B-Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct) Compared to Qwen2-72B-Instruct, Dracarys has better LiveCodeBench scores (see evaluation results below). ### Model Description - **Developed by:** [Abacus.AI](https://abacus.ai) - **License:** https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/LICENSE - **Finetuned from model:** [Qwen2-72B-Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct). ## How to use The prompt format is unchanged from Qwen2-72B-Instruct (see evaluations for prompt details for LCB) ### Use with transformers See the snippet below for usage with Transformers: ```python import transformers import torch model_id = "abacusai/Dracarys-72B-Instruct" pipeline = transformers.pipeline( "text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto", ) messages = [ {"role": "system", "content": "You are data science coding assistant that generates Python code using Pandas and Numpy."}, {"role": "user", "content": "Write code to select rows from the dataframe `df` having the maximum `temp` for each `city`"}, ] prompt = pipeline.tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) terminators = [ pipeline.tokenizer.eos_token_id, pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>") ] outputs = pipeline( prompt, max_new_tokens=256, eos_token_id=terminators, do_sample=True, temperature=0.6, top_p=0.9, ) print(outputs[0]["generated_text"][len(prompt):]) ``` # Evaluation Results ## LiveCodeBench | Model | Code Generation | Code Execution |Test Output Prediction | |---------------------------|-----------------|----------------|-----------------------| | **Dracarys-72B-Instruct** | 33.86 | 54.30 | 53.26 | | Qwen2-72B-Instruct | 30.10 | TBD | TBD | # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_abacusai__Dracarys-72B-Instruct) | Metric |Value| |-------------------|----:| |Avg. |42.37| |IFEval (0-Shot) |78.56| |BBH (3-Shot) |56.94| |MATH Lvl 5 (4-Shot)|33.61| |GPQA (0-shot) |18.79| |MuSR (0-shot) |16.81| |MMLU-PRO (5-shot) |49.51|