|
--- |
|
license: mit |
|
train: false |
|
inference: false |
|
pipeline_tag: text-generation |
|
model-index: |
|
- name: aanaphi2-v0.1 |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: AI2 Reasoning Challenge (25-Shot) |
|
type: ai2_arc |
|
config: ARC-Challenge |
|
split: test |
|
args: |
|
num_few_shot: 25 |
|
metrics: |
|
- type: acc_norm |
|
value: 63.91 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mobiuslabsgmbh/aanaphi2-v0.1 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: HellaSwag (10-Shot) |
|
type: hellaswag |
|
split: validation |
|
args: |
|
num_few_shot: 10 |
|
metrics: |
|
- type: acc_norm |
|
value: 77.97 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mobiuslabsgmbh/aanaphi2-v0.1 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MMLU (5-Shot) |
|
type: cais/mmlu |
|
config: all |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 57.73 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mobiuslabsgmbh/aanaphi2-v0.1 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: TruthfulQA (0-shot) |
|
type: truthful_qa |
|
config: multiple_choice |
|
split: validation |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: mc2 |
|
value: 51.56 |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mobiuslabsgmbh/aanaphi2-v0.1 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: Winogrande (5-shot) |
|
type: winogrande |
|
config: winogrande_xl |
|
split: validation |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 73.64 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mobiuslabsgmbh/aanaphi2-v0.1 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: GSM8k (5-shot) |
|
type: gsm8k |
|
config: main |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 54.89 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mobiuslabsgmbh/aanaphi2-v0.1 |
|
name: Open LLM Leaderboard |
|
--- |
|
*aanaphi2-v0.1* is a finetuned (SFT + DPO) chat model based on <a href="https://huggingface.co/microsoft/phi-2">Microsoft's Phi-2 base model</a> (2.8B parameters). |
|
|
|
![image/gif](https://cdn-uploads.huggingface.co/production/uploads/636b945ef575d3705149e982/pIeboaaroFY5fpomUADrS.gif) |
|
|
|
## Performance |
|
| Models | phi-2 | aanaphi2-v0.1 | |
|
|-------------------|------------------|------------------| |
|
| ARC (25-shot) | 61.09 | <b>63.74</b> | |
|
| HellaSwag (10-shot)| 75.11 | <b>78.30</b> | |
|
| MMLU (5-shot) | <b>58.11</b> | 57.70 | |
|
| TruthfulQA-MC2 | 44.47 | <b>51.56</b> | |
|
| Winogrande (5-shot)| <b>74.35</b> | 73.40 | |
|
| GSM8K (5-shot) | 54.81 | <b>58.61</b> | |
|
| Average | 61.33 | <b>63.89</b> | |
|
|
|
|
|
## Installation |
|
Make sure you have the latest version of the transformers library: |
|
``` |
|
pip install pip --upgrade && pip install transformers --upgrade |
|
``` |
|
|
|
## Basic Usage |
|
``` Python |
|
#Load model |
|
import transformers, torch |
|
compute_dtype = torch.float16 |
|
cache_path = '' |
|
device = 'cuda' |
|
model_id = "mobiuslabsgmbh/aanaphi2-v0.1" |
|
model = transformers.AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=compute_dtype, |
|
cache_dir=cache_path, |
|
device_map=device) |
|
tokenizer = transformers.AutoTokenizer.from_pretrained(model_id, cache_dir=cache_path) |
|
|
|
#Set Prompt format |
|
instruction_template = "### Human: " |
|
response_template = "### Assistant: " |
|
def prompt_format(prompt): |
|
out = instruction_template + prompt + '\n' + response_template |
|
return out |
|
model.eval(); |
|
|
|
@torch.no_grad() |
|
def generate(prompt, max_length=1024): |
|
prompt_chat = prompt_format(prompt) |
|
inputs = tokenizer(prompt_chat, return_tensors="pt", return_attention_mask=True).to('cuda') |
|
outputs = model.generate(**inputs, max_length=max_length, eos_token_id= tokenizer.eos_token_id) |
|
text = tokenizer.batch_decode(outputs[:,:-1])[0] |
|
return text |
|
|
|
#Generate |
|
print(generate('If A+B=C and B=C, what would be the value of A?')) |
|
``` |
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_mobiuslabsgmbh__aanaphi2-v0.1) |
|
|
|
| Metric |Value| |
|
|---------------------------------|----:| |
|
|Avg. |63.28| |
|
|AI2 Reasoning Challenge (25-Shot)|63.91| |
|
|HellaSwag (10-Shot) |77.97| |
|
|MMLU (5-Shot) |57.73| |
|
|TruthfulQA (0-shot) |51.56| |
|
|Winogrande (5-shot) |73.64| |
|
|GSM8k (5-shot) |54.89| |
|
|
|
|