|
--- |
|
language: |
|
- en |
|
license: apache-2.0 |
|
library_name: transformers |
|
tags: |
|
- roleplay |
|
- text-generation-inference |
|
model-index: |
|
- name: EstopianMaid-13B |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: AI2 Reasoning Challenge (25-Shot) |
|
type: ai2_arc |
|
config: ARC-Challenge |
|
split: test |
|
args: |
|
num_few_shot: 25 |
|
metrics: |
|
- type: acc_norm |
|
value: 60.49 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=KatyTheCutie/EstopianMaid-13B |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: HellaSwag (10-Shot) |
|
type: hellaswag |
|
split: validation |
|
args: |
|
num_few_shot: 10 |
|
metrics: |
|
- type: acc_norm |
|
value: 83.49 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=KatyTheCutie/EstopianMaid-13B |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MMLU (5-Shot) |
|
type: cais/mmlu |
|
config: all |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 56.18 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=KatyTheCutie/EstopianMaid-13B |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: TruthfulQA (0-shot) |
|
type: truthful_qa |
|
config: multiple_choice |
|
split: validation |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: mc2 |
|
value: 52.35 |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=KatyTheCutie/EstopianMaid-13B |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: Winogrande (5-shot) |
|
type: winogrande |
|
config: winogrande_xl |
|
split: validation |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 75.53 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=KatyTheCutie/EstopianMaid-13B |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: GSM8k (5-shot) |
|
type: gsm8k |
|
config: main |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 9.17 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=KatyTheCutie/EstopianMaid-13B |
|
name: Open LLM Leaderboard |
|
--- |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/653a2392341143f7774424d8/fyK_RtEjb9sLF_Mq0nZm2.png) |
|
|
|
Based on feedback Estopian made can: |
|
|
|
- EstopianMaid is good at sticking to the character card. |
|
- maintains coherency in a setting with multiple characters. |
|
- Able to create new scenario's |
|
|
|
- Feature from Thespis: |
|
|
|
![image/webp](https://cdn-uploads.huggingface.co/production/uploads/653a2392341143f7774424d8/1Z4P7XshVOW8fLg9pey4H.webp) |
|
|
|
- Prompt Template: Alpaca |
|
### Instruction: |
|
{prompt} |
|
|
|
### Response: |
|
|
|
Recommended settings: |
|
- SillyTavern Default Preset. |
|
- Temperature: 0.7 |
|
- Min-P: 0.3 |
|
- Amount to Gen: 256 |
|
- Top P: 1 |
|
- Repetition penalty: 1.10 |
|
|
|
Models used: |
|
|
|
BlueNipples/TimeCrystal-l2-13B |
|
cgato/Thespis-13b-DPO-v0.7 |
|
KoboldAI/LLaMA2-13B-Estopia |
|
NeverSleep/Noromaid-13B-0.4-DPO |
|
Doctor-Shotgun/cat-v1.0-13b |
|
|
|
Feedback is always appreciated! |
|
Thank you KoboldAI for their usage of their MergeBox and Caitlyn G. for their support and feedback. |
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_KatyTheCutie__EstopianMaid-13B) |
|
|
|
| Metric |Value| |
|
|---------------------------------|----:| |
|
|Avg. |56.20| |
|
|AI2 Reasoning Challenge (25-Shot)|60.49| |
|
|HellaSwag (10-Shot) |83.49| |
|
|MMLU (5-Shot) |56.18| |
|
|TruthfulQA (0-shot) |52.35| |
|
|Winogrande (5-shot) |75.53| |
|
|GSM8k (5-shot) | 9.17| |
|
|
|
|