Spaces:

mozilla-ai
/

surf-spot-finder

Running

surf-spot-finder / app.py

github-actions[bot]

Sync with https://github.com/mozilla-ai/surf-spot-finder

35b631a about 2 months ago

5.61 kB

	from components.sidebar import ssf_sidebar
	from constants import DEFAULT_TOOLS
	import streamlit as st
	import asyncio
	import nest_asyncio
	from services.agent import (
	configure_agent,
	display_evaluation_results,
	display_output,
	evaluate_agent,
	run_agent,
	)

	nest_asyncio.apply()

	# Set page config
	st.set_page_config(page_title="Surf Spot Finder", page_icon="🏄", layout="wide")

	# Allow a user to resize the sidebar to take up most of the screen to make editing eval cases easier
	st.markdown(
	"""
	<style>
	/* When sidebar is expanded, adjust main content */
	section[data-testid="stSidebar"][aria-expanded="true"] {
	max-width: 99% !important;
	}
	</style>
	""",
	unsafe_allow_html=True,
	)

	with st.sidebar:
	user_inputs = ssf_sidebar()
	is_valid = user_inputs is not None
	run_button = st.button("Run Agent 🤖", disabled=not is_valid, type="primary")


	# Main content
	async def main():
	# Handle agent execution button click
	if run_button:
	agent, agent_config = await configure_agent(user_inputs)
	agent_trace = await run_agent(agent, agent_config)

	await display_output(agent_trace)

	evaluation_result = await evaluate_agent(agent_config, agent_trace)

	await display_evaluation_results(evaluation_result)
	else:
	st.title("🏄 Surf Spot Finder")
	st.markdown(
	"Find the best surfing spots based on your location and preferences! [Github Repo](https://github.com/mozilla-ai/surf-spot-finder)"
	)
	st.info(
	"👈 Configure your search parameters in the sidebar and click Run to start!"
	)

	# Display tools in a more organized way
	st.markdown("### 🛠️ Available Tools")

	st.markdown("""
	The AI Agent built for this project has a few tools available for use in order to find the perfect surf spot.
	The agent is given the freedom to use (or not use) these tools in order to accomplish the task.
	""")

	weather_tools = [
	tool
	for tool in DEFAULT_TOOLS
	if "forecast" in tool.__name__ or "weather" in tool.__name__
	]
	for tool in weather_tools:
	with st.expander(f"🌤️ {tool.__name__}"):
	st.markdown(tool.__doc__ or "No description available")
	location_tools = [
	tool
	for tool in DEFAULT_TOOLS
	if "lat" in tool.__name__
	or "lon" in tool.__name__
	or "area" in tool.__name__
	]
	for tool in location_tools:
	with st.expander(f"📍 {tool.__name__}"):
	st.markdown(tool.__doc__ or "No description available")

	web_tools = [
	tool
	for tool in DEFAULT_TOOLS
	if "web" in tool.__name__ or "search" in tool.__name__
	]
	for tool in web_tools:
	with st.expander(f"🌐 {tool.__name__}"):
	st.markdown(tool.__doc__ or "No description available")

	# add a check that all tools were listed
	if len(weather_tools) + len(location_tools) + len(web_tools) != len(
	DEFAULT_TOOLS
	):
	st.warning(
	"Some tools are not listed. Please check the code for more details."
	)

	# Add Custom Evaluation explanation section
	st.markdown("### 📊 Custom Evaluation")
	st.markdown("""
	The Surf Spot Finder includes a powerful evaluation system that allows you to customize how the agent's performance is assessed.
	You can find these settings in the sidebar under the "Custom Evaluation" expander.
	""")

	with st.expander("Learn more about Custom Evaluation"):
	st.markdown("""
	#### What is Custom Evaluation?
	The Custom Evaluation feature uses an LLM-as-a-Judge approach to evaluate how well the agent performs its task.
	An LLM will be given the complete agent trace (not just the final answer), and will assess the agent's performance based on the criteria you set.
	You can customize:

	- Evaluation Model: Choose which LLM should act as the judge
	- Evaluation Criteria: Define specific checkpoints that the agent should meet
	- Scoring System: Assign points to each criterion

	#### How to Use Custom Evaluation

	1. Select an Evaluation Model: Choose which LLM you want to use as the judge
	2. Edit Checkpoints: Use the data editor to:
	- Add new evaluation criteria
	- Modify existing criteria
	- Adjust point values
	- Remove criteria you don't want to evaluate

	#### Example Criteria
	You can evaluate things like:
	- Tool usage and success
	- Order of operations
	- Quality of final recommendations
	- Response completeness
	- Number of steps taken

	#### Tips for Creating Good Evaluation Criteria
	- Be specific about what you want to evaluate
	- Use clear, unambiguous language
	- Consider both process (how the agent works) and outcome (what it produces)
	- Assign appropriate point values based on importance

	The evaluation results will be displayed after each agent run, showing how well the agent met your custom criteria.
	""")


	if __name__ == "__main__":
	loop = asyncio.new_event_loop()
	loop.run_until_complete(main())