leaderboard-pr-bot's picture
Adding Evaluation Results
412edb9
metadata
language:
  - en

Description:

This is a llama 13b model merge of the LoRA with the same name.

Objective for this project:

To create a model that upholds a logical thread, regardless of whether the output is verbose or concise. Training has been performed on a version of the pile of sets, reduced to 40% of its original size, to expedite training iterations. I personally utilize this model as an aid for storytelling and writing. While it serves this purpose adequately, I still perceive this version as a prototype.

Prompt format:

Stanford Alpaca

The prompt should start on a new line after "### Response:"

  • For examples with a non-empty input field:
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input}

### Response:
  • For examples with an empty input field:
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:

Perplexity Benchmarks:

  • wikitext: 4.66796875

Training information:

  • 2 Epochs
  • 64 / 32 R / A
  • 1024 Cutoff
  • 19 hours on an A6000

Data used in training:

All cleaned and scrubbed in various ways then culled to various degrees.

  • Camel biology, physics, chemistry, math, and AI society
  • Alpaca evol instruct
  • GPTeacher Instruct
  • Alpaca GPT4
  • Dolly Databricks

Plans for the future, a brief overview:

  • Pivot to a conversational format going forward
  • Train another 13b LoRA against the entirety of my pile of sets rather than just a portion of it for Mk2
  • Train 30b on the Mk2 pile of sets
  • Expand the story generation capabilities and likely more for Mk3

Model used for training and other information:

https://huggingface.co/PocketDoc/llama-13b-gptq-4bit-128g

Merge model: https://huggingface.co/huggyllama/llama-13b

Disclaimer:

It has not been aligned and no warranty is given for the quality or safety of its outputs.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 45.76
ARC (25-shot) 58.79
HellaSwag (10-shot) 81.79
MMLU (5-shot) 48.12
TruthfulQA (0-shot) 41.24
Winogrande (5-shot) 76.16
GSM8K (5-shot) 8.49
DROP (3-shot) 5.71