llamaRAGdrama / README.md

Adding Evaluation Results (#1)

8c103ca verified 4 months ago

No virus

4.9 kB

	---
	license: apache-2.0
	model-index:
	- name: llamaRAGdrama
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 72.01
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kevin009/llamaRAGdrama
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 88.83
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kevin009/llamaRAGdrama
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 64.5
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kevin009/llamaRAGdrama
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 70.24
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kevin009/llamaRAGdrama
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 86.66
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kevin009/llamaRAGdrama
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 65.66
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kevin009/llamaRAGdrama
	name: Open LLM Leaderboard
	---
	It remain factual and reliable even in dramatic situations.

	---

	### Model Card for kevin009/llamaRAGdrama

	#### Model Details
	- Model Name: kevin009/llamaRAGdrama
	- Model Type: Fine-tuned for Q&A, RAG.
	- Fine-tuning Objective: Synthesis text content in Q&A, RAG scenarios.

	#### Intended Use
	- Applications: RAG, Q&A

	#### Training Data
	- Sources: Includes a diverse dataset of dramatic texts, enriched with factual databases and reliable sources to train the model on generating content that remains true to real-world facts.
	- Preprocessing: In addition to removing non-content text, data was annotated to distinguish between purely creative elements and those that require factual accuracy, ensuring a balanced training approach.

	#### How to Use
	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	tokenizer = AutoTokenizer.from_pretrained("kevin009/llamaRAGdrama")
	model = AutoModelForCausalLM.from_pretrained("kevin009/llamaRAGdrama")

	input_text = "Enter your prompt here"
	input_tokens = tokenizer.encode(input_text, return_tensors='pt')
	output_tokens = model.generate(input_tokens, max_length=100, num_return_sequences=1, temperature=0.9)
	generated_text = tokenizer.decode(output_tokens[0], skip_special_tokens=True)

	print(generated_text)
	```
	Replace `"Enter your prompt here"` with your starting text. Adjust `temperature` for creativity level.

	#### Limitations and Biases
	- Content Limitation: While designed to be truthful, It may not be considered safe.
	- Biases: It may remain biases and inaccurate.

	#### Licensing and Attribution
	- License: Apache-2.0
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_kevin009__llamaRAGdrama)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|74.65\|
	\|AI2 Reasoning Challenge (25-Shot)\|72.01\|
	\|HellaSwag (10-Shot) \|88.83\|
	\|MMLU (5-Shot) \|64.50\|
	\|TruthfulQA (0-shot) \|70.24\|
	\|Winogrande (5-shot) \|86.66\|
	\|GSM8k (5-shot) \|65.66\|