NewstaR
/

Storcel-7b

Text Generation

text-generation-inference

Model card Files Files and versions Community

Storcel-7b / README.md

baebee's picture

Adding Evaluation Results (#1)

c8c8772 8 months ago

|

raw history blame

No virus

2.93 kB

	---
	license: mit
	datasets:
	- Open-Orca/OpenOrca
	- conceptofmind/cot_submix_original
	- conceptofmind/t0_submix_original
	- conceptofmind/niv2_submix_original
	- conceptofmind/flan2021_submix_original
	- ehartford/dolphin
	language:
	- en
	tags:
	- merge
	- slerp
	inference: false
	metrics:
	- accuracy
	- bleu
	---
	<h1 style="text-align: center">Dorflan</h1>
	<h2 style="text-align: center">An experimental model</h2>
	<hr>


	\| Model \| Average ⬆️ \| ARC \| HellaSwag \| MMLU \| TruthfulQA \|
	\|:------------:\|:------------:\|:-------:\|:---------:\|:-------:\|:----------:\|
	\| formulae/Dorflan 📑 \| 58.19 \| 54.44 \| 75.78 \| 51.36 \| 51.17 \|



	## Model Details
	Dorflan is an experimental merged model created from the following three foundation models:

	- stabilityai/StableBeluga-7B
	- ehartford/dolphin-llama2-7b
	- AIDC-ai-business/Marcoroni-7B

	Dorflan was created by merging the weights and architectures of these three models using a custom merging technique. No further fine-tuning was performed after the merge.

	Once the model obtains it's evaluation scores, then we'll know if it works or not.

	## Intended Use
	As an experimental model, Dorflan is intended for testing and research purposes only. It should not be used for production systems or to generate content for public use.

	## Training Data
	Dorflan inherits training data from its three foundation models:

	- StableBeluga-7B: COT, Niv2, t0, & FLAN2021
	- dolphin-llama2-7b: Dolphin
	- Marcoroni-7B: OpenOrca

	## Limitations
	As an untested merged model, Dorflan has unknown capabilities and limitations. Potential issues include:

	- Instability due to merged architectures
	- Compounded bias and issues from all three foundation models
	- Decreased performance on some tasks compared to the foundation models

	Extensive testing is required to characterize Dorflan's capabilities and limitations.

	## Ethical Considerations
	- Dorflan may exhibit harmful biases inherited from its training data
	- Output may be unreliable or manipulated due to instability
	- Experimental nature increases potential for misuse

	Use this model ethically and do not deploy it for sensitive applications.

	## Contact Information
	Please report issues or concerns with this model to the creator for further investigation.
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_formulae__Dorflan)

	\| Metric \| Value \|
	\|-----------------------\|---------------------------\|
	\| Avg. \| 47.44 \|
	\| ARC (25-shot) \| 54.44 \|
	\| HellaSwag (10-shot) \| 75.78 \|
	\| MMLU (5-shot) \| 51.36 \|
	\| TruthfulQA (0-shot) \| 51.17 \|
	\| Winogrande (5-shot) \| 72.61 \|
	\| GSM8K (5-shot) \| 0.38 \|
	\| DROP (3-shot) \| 26.37 \|