--- language: - en license: llama2 library_name: peft tags: - Mistral pipeline_tag: text-generation model-index: - name: SpeechlessCoder results: - task: type: text-generation dataset: name: HumanEval type: openai_humaneval metrics: - type: pass@1 value: 0.0 name: pass@1 verified: false - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 61.77 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=uukuguy/Mistral-7B-OpenOrca-lora-merged name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 83.61 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=uukuguy/Mistral-7B-OpenOrca-lora-merged name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 64.34 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=uukuguy/Mistral-7B-OpenOrca-lora-merged name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 42.7 source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=uukuguy/Mistral-7B-OpenOrca-lora-merged name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 78.53 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=uukuguy/Mistral-7B-OpenOrca-lora-merged name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 38.13 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=uukuguy/Mistral-7B-OpenOrca-lora-merged name: Open LLM Leaderboard --- # Mistral-7B-OpenOrca-lora-merged **This is a test.** This is a regenerated model that combines the base model Mistral-7B-v0.1 with the LoRA model [Mistral-7B-OpenOrca-lora](https://huggingface.co/uukuguy/Mistral-7B-OpenOrca-lora). This LoRA model is extracted from the efficient parameter fine-tuned model ([Mistral-7B-OpenOra](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca)), and now it needs to be verified whether this LoRA model can achieve comparable performance with the original model. The final goal is to create a toolkit that can simultaneously load multiple LoRA modules, and automatically switch to the appropriate combination of LoRA modules based on user queries to generate the best answer. The source code is [here](https://github.com/uukuguy/multi_loras) ## Mistral-7B-OpenOrca - Extract lora model [Mistral-7B-OpenOrca-lora](https://huggingface.co/uukuguy/Mistral-7B-OpenOrca-lora) from [Mistral-7B-OpenOrca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca); - Merge the base model [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) with lora model to [Mistral-7B-OpenOrca-lora-merged](https://huggingface.co/uukuguy/Mistral-7B-OpenOrca-lora-merged) - LLM Evaluation ... ### Local Test | | ARC_acc_norm (25-shot) | HellaSwag_acc_norm (10-shot) | MMLU_acc (5-shot) | TruthfulQA_mc2 (0-shot) | GSM8K_acc (8-shot) | Open LLM Score | | ------ | ------ | ------ | ------ | ------ | ------ | ------ | | Mistral-7B-OpenOrca | **71** | 83 | 61.42 | 45 | 40 | 65.11 | | **r=256** | 68 | **84** | **64.28** | 46.953 | **41** | **65.81** | | r=64 | 67 | 84 | 64.26 | **47.32** | **41** | 65.65 | | *r=16* | *65* | *83* | *62.84* | *46.95* | *38* | *64.45* | ### Open LLM Leaderboard | | ARC_acc_norm (25-shot) | HellaSwag_acc_norm (10-shot) | MMLU_acc (5-shot) | TruthfulQA_mc2 (0-shot) | Open LLM Score | | ------ | ------ | ------ | ------ | ------ | ------ | | Mistral-7B-SlimOrca | 62.54 | 83.86 | **62.77** | **54.23** | **65.85** | | Mistral-7B-OpenOrca | **64.08** | **83.99** | 62.24 | 53.05 | 65.84 | ## lm-evaluation-harness [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) | Metric | Mistral-7B-OpenOrca | Mistral-7B-OpenOrca-lora| Mistral-7B-OpenOrca-lora-merged | | --- | --- |--- | --- | | ARC | 64.08 | | | | HellaSwag | 83.99 | | | | MMLU | 62.24 | | | | TruthfulQA | 53.05 | | | | Average | 65.84 | | | ## HumanEval | Metric | Mistral-7B-OpenOrca | Mistral-7B-OpenOrca-lora| Mistral-7B-OpenOrca-lora-merged | | --- | --- | --- | --- | | humaneval-python | 35.976 | | | ## Training procedure The following `bitsandbytes` quantization config was used during training: - quant_method: bitsandbytes - load_in_8bit: False - load_in_4bit: True - llm_int8_threshold: 6.0 - llm_int8_skip_modules: None - llm_int8_enable_fp32_cpu_offload: False - llm_int8_has_fp16_weight: False - bnb_4bit_quant_type: nf4 - bnb_4bit_use_double_quant: True - bnb_4bit_compute_dtype: bfloat16 ### Framework versions - PEFT 0.5.0 # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_uukuguy__Mistral-7B-OpenOrca-lora-merged) | Metric |Value| |---------------------------------|----:| |Avg. |61.52| |AI2 Reasoning Challenge (25-Shot)|61.77| |HellaSwag (10-Shot) |83.61| |MMLU (5-Shot) |64.34| |TruthfulQA (0-shot) |42.70| |Winogrande (5-shot) |78.53| |GSM8k (5-shot) |38.13|