--- datasets: - phatvo/hotpotqa-raft-dev-100 library_name: transformers license: llama3.1 metrics: - f1 - exact_match pipeline_tag: text-generation --- # Model Card for Model ID ## Model Details ### Model Description LORA adapters of `meta-llama/Meta-Llama-3.1-8B-Instruct`, trained on 100 context samples from the HotpotQA dataset using the RAFT method, enable the model to better reason through the context and return more accurate outcomes. ### Evaluation Evaluated on FULL validation set of HotpotQA. | type | exatch_match| f1 | precision | recall | |--------------|-------------|------------|---------------|---------| | pretrained | 0.2980 | 0.3979 | 0.4116 | 0.5263 | | finetuned | 0.3606 | **0.4857** | 0.4989 | 0.5318 | Finetuned version increases **22% on F1 and 15% on average** ### Usage ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline model_id = "phatvo/Meta-Llama3.1-8B-Instruct-RAFT" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", revision="main", trust_remote_code=True) pipe = pipeline("text-generation", model=model, tokenizer=tokenizer) inst = "Given the question and context below, thinking in logical reasoning way for your answer.\ Please provide only your answer in this format: CoT Answer: {reason} : {answer}." context = "" question = "" prompt = f"{context}\n{question}" chat = [ {"role": "system", "content": inst}, {"role": "user", "content": prompt}, ] prompt = tokenizer.apply_chat_template(chat, tokenize=False) output = pipe(prompt, temperature=0.001, max_new_tokens=1024, # recommended to set it more than 800 return_full_text=False, do_sample=True) print(output[0]["generated_text"]) # CoT Answer: thoughts... : final_answer... ```