--- language: - en license: other library_name: transformers tags: - orpo - qwen2 - rlhf - sft base_model: - Qwen/Qwen2-72B-Instruct datasets: - mlabonne/orpo-dpo-mix-40k license_name: tongyi-qianwen license_link: https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/LICENSE model-index: - name: Qwen2-72B-Orpo-v0.1 results: - task: type: text-generation name: Text Generation dataset: name: IFEval (0-Shot) type: HuggingFaceH4/ifeval args: num_few_shot: 0 metrics: - type: inst_level_strict_acc and prompt_level_strict_acc value: 78.8 name: strict accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=dfurman/Qwen2-72B-Orpo-v0.1 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: BBH (3-Shot) type: BBH args: num_few_shot: 3 metrics: - type: acc_norm value: 57.41 name: normalized accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=dfurman/Qwen2-72B-Orpo-v0.1 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MATH Lvl 5 (4-Shot) type: hendrycks/competition_math args: num_few_shot: 4 metrics: - type: exact_match value: 35.42 name: exact match source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=dfurman/Qwen2-72B-Orpo-v0.1 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GPQA (0-shot) type: Idavidrein/gpqa args: num_few_shot: 0 metrics: - type: acc_norm value: 17.9 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=dfurman/Qwen2-72B-Orpo-v0.1 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MuSR (0-shot) type: TAUR-Lab/MuSR args: num_few_shot: 0 metrics: - type: acc_norm value: 20.87 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=dfurman/Qwen2-72B-Orpo-v0.1 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU-PRO (5-shot) type: TIGER-Lab/MMLU-Pro config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 49.5 name: accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=dfurman/Qwen2-72B-Orpo-v0.1 name: Open LLM Leaderboard --- # dfurman/Qwen2-72B-Orpo-v0.1 ## Model This model is a finetune of `Qwen/Qwen2-72B-Instruct` on 1.5k rows of `mlabonne/orpo-dpo-mix-40k`. It was trained as a generalist language model for a variety of text generation use cases, including support of agentic capabilities, roleplaying, reasoning, multi-turn conversations, long context coherence, and more. Thanks go out to [mlabonne](https://huggingface.co/mlabonne), [Qwen](https://huggingface.com/Qwen), et al. for the source dataset and base model. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62afc20ca5bd7cef3e1ab3f4/CdV47RW1zjr7qvD073NkZ.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62afc20ca5bd7cef3e1ab3f4/PB-25NSKcbFMZuZ3vYptR.png) You can find the experiment on W&B at [this address](https://wandb.ai/dryanfurman/huggingface/runs/fw7mtub1?nw=nwuserdryanfurman). ## 💻 Usage
Setup ```python !pip install -qU transformers accelerate bitsandbytes !huggingface-cli download dfurman/Qwen2-72B-Orpo-v0.1 ``` ```python from transformers import AutoTokenizer, BitsAndBytesConfig import transformers import torch if torch.cuda.get_device_capability()[0] >= 8: !pip install -qqq flash-attn attn_implementation = "flash_attention_2" torch_dtype = torch.bfloat16 else: attn_implementation = "eager" torch_dtype = torch.float16 # quantize if necessary # bnb_config = BitsAndBytesConfig( # load_in_4bit=True, # bnb_4bit_quant_type="nf4", # bnb_4bit_compute_dtype=torch_dtype, # bnb_4bit_use_double_quant=True, # ) model = "dfurman/Qwen2-72B-Orpo-v0.1" tokenizer = AutoTokenizer.from_pretrained(model) pipeline = transformers.pipeline( "text-generation", model=model, model_kwargs={ "torch_dtype": torch_dtype, # "quantization_config": bnb_config, "device_map": "auto", "attn_implementation": attn_implementation, } ) ```
### Run ```python question = """The bakers at the Beverly Hills Bakery baked 200 loaves of bread on Monday morning. They sold 93 loaves in the morning and 39 loaves in the afternoon. A grocery store then returned 6 unsold loaves back to the bakery. How many loaves of bread did the bakery have left? Respond as succinctly as possible. Format the response as a completion of this table: |step|subquestion|procedure|result| |:---|:----------|:--------|:-----:|""" messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": question}, ] prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) # print("***Prompt:\n", prompt) outputs = pipeline(prompt, max_new_tokens=1000, do_sample=True, temperature=0.7, top_k=50, top_p=0.95) print("***Generation:") print(outputs[0]["generated_text"][len(prompt):]) ``` ``` ***Generation: |1|Initial loaves|Start with total loaves|200| |2|Sold in morning|Subtract morning sales|200 - 93 = 107| |3|Sold in afternoon|Subtract afternoon sales|107 - 39 = 68| |4|Returned loaves|Add returned loaves|68 + 6 = 74| ``` # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_dfurman__Qwen2-72B-Orpo-v0.1) | Metric |Value| |-------------------|----:| |Avg. |43.32| |IFEval (0-Shot) |78.80| |BBH (3-Shot) |57.41| |MATH Lvl 5 (4-Shot)|35.42| |GPQA (0-shot) |17.90| |MuSR (0-shot) |20.87| |MMLU-PRO (5-shot) |49.50|