React-FT-8B

A LoRA fine-tune of meta-llama/Llama-3.1-8B-Instruct that performs ReAct-style agentic reasoning over task-oriented dialogue, distilled from a larger ICL-prompted teacher model (meta-llama/Llama-3.3-70B-Instruct).

Model Description

This model implements an adaptation of the ReAct framework tailored to the MultiWOZ dataset. It is designed to actively retrieve information about relevant instances through a REST API, while preventing access to unrelated information. The system receives the instruction prompt, the dialogue history and the user’s query as inputs and generates the final response through an iterative sequence of thoughts, actions, and observations. The actions requires the use of these tools:

• Look[]: returns the metadata of all the instances of the selected travel scene.

• Search[]: retrieves the instances in the scene that satisfy a specific query.

• Finish[answer]: an action used by the model to indicate the final response that the system returns

This specific checkpoint (React FT 8B) is the result of an LLM-distillation process. A series of reasoning trajectories were generated using larger model (React ICL 70B) via In-Context Learning (ICL) inference. These trajectories were filtered using an LLM-based judge (Prometheus) to construct a high-quality training set. This curated dataset was used to fine-tune Llama-3.1-8B-Instruct via LoRA (PEFT), producing a smaller, more efficient model that outperforms the reasoning quality of its larger teacher.

Downloads last month: 61

Model tree for nowhereman14/llama3-8b-react-distilled-70b

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Adapter

(2496)

this model

nowhereman14
/

llama3-8b-react-distilled-70b

React-FT-8B

Model Description

Model tree for nowhereman14/llama3-8b-react-distilled-70b

Dataset used to train nowhereman14/llama3-8b-react-distilled-70b