phatvo
/

Meta-Llama3.1-8B-Instruct-RAFT

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Meta-Llama3.1-8B-Instruct-RAFT / README.md

phatvo's picture

Update README.md

bb8976d verified 7 months ago

|

1.98 kB

	---
	datasets:
	- phatvo/hotpotqa-raft-dev-100
	library_name: transformers
	license: llama3.1
	metrics:
	- f1
	- exact_match
	pipeline_tag: text-generation
	---

	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->



	## Model Details

	### Model Description

	LORA adapters of `meta-llama/Meta-Llama-3.1-8B-Instruct`, trained on 100 context samples from the HotpotQA dataset using the RAFT method, enable the model to better reason through the context and return more accurate outcomes.

	### Evaluation

	Evaluated on FULL validation set of HotpotQA.


	\| type \| exatch_match\| f1 \| precision \| recall \|
	\|--------------\|-------------\|------------\|---------------\|---------\|
	\| pretrained \| 0.2980 \| 0.3979 \| 0.4116 \| 0.5263 \|
	\| finetuned \| 0.3606 \| 0.4857 \| 0.4989 \| 0.5318 \|

	Finetuned version increases 22% on F1 and 15% on average

	### Usage

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

	model_id = "phatvo/Meta-Llama3.1-8B-Instruct-RAFT"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id, device_map="auto", revision="main", trust_remote_code=True)
	pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

	inst = "Given the question and context below, thinking in logical reasoning way for your answer.\
	Please provide only your answer in this format: CoT Answer: {reason} <ANSWER>: {answer}."
	context = ""
	question = ""
	prompt = f"{context}\n{question}"

	chat = [
	{"role": "system", "content": inst},
	{"role": "user", "content": prompt},
	]
	prompt = tokenizer.apply_chat_template(chat, tokenize=False)

	output = pipe(prompt,
	temperature=0.001,
	max_new_tokens=1024, # recommended to set it more than 800
	return_full_text=False,
	do_sample=True)

	print(output[0]["generated_text"])
	# CoT Answer: thoughts... <ANSWER>: final_answer...
	```