theakshayrane commited on
Commit
8fa836d
·
verified ·
1 Parent(s): bf1ec33

Upload 6 files

Browse files
Files changed (5) hide show
  1. README.md +49 -6
  2. agents.py +213 -0
  3. app.py +134 -147
  4. model.py +90 -0
  5. tool.py +47 -0
README.md CHANGED
@@ -1,15 +1,58 @@
1
  ---
2
- title: Template Final Assignment
3
- emoji: 🕵🏻‍♂️
4
- colorFrom: indigo
5
  colorTo: indigo
6
  sdk: gradio
7
- sdk_version: 5.25.2
8
  app_file: app.py
9
  pinned: false
10
  hf_oauth: true
11
- # optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
12
  hf_oauth_expiration_minutes: 480
13
  ---
14
 
15
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Agent GAIA
3
+ emoji: 🏆
4
+ colorFrom: pink
5
  colorTo: indigo
6
  sdk: gradio
7
+ sdk_version: 5.33.0
8
  app_file: app.py
9
  pinned: false
10
  hf_oauth: true
 
11
  hf_oauth_expiration_minutes: 480
12
  ---
13
 
14
+ # GAIA Benchmark Agent
15
+
16
+ This project is an AI agent built for the GAIA benchmark as part of the Hugging Face Agents course. It combines different LLM models and multimodal tools to reason over text, audio, images and video to solve complex tasks.
17
+
18
+
19
+ ## Tools
20
+
21
+ The agent includes a variety of tools for handling diverse input types:
22
+
23
+ - **Vision Tool:** Analyze images using Gemini Vision.
24
+ - **YouTube Frame Extractor:** Sample video frames from YouTube at regular intervals.
25
+ - **YouTube QA Tool:** Ask questions about video content using Gemini via file URI.
26
+ - **OCR Tool:** Extract text from images using Tesseract.
27
+ - **Audio Transcriber:** Transcribe audio files and YouTube videos using Whisper.
28
+ - **File Tools:** Read plain text, download files from URLs, and summarize CSV or Excel files.
29
+
30
+ These tools are defined using the `@tool` decorator from the `smolagents` library, making them callable by the agent during task execution.
31
+
32
+ ## Models Used
33
+
34
+ - `Gemini 2.5 Flash` (via Google's Generative AI API)
35
+ - **Whisper** for speech-to-text transcription
36
+ - **Hugging Face Transformers** (optional local model support)
37
+ - **LiteLLM** as a unified interface for calling external language models
38
+
39
+ ## Installation
40
+
41
+ 1. Install all required dependencies using
42
+
43
+ ```bash
44
+ pip install -r requirements.txt
45
+ ```
46
+
47
+ 2. Convfigure environment with API_KEYS
48
+
49
+ ```bash
50
+ echo "GEMINI_API_KEY=your_key_here" > .env
51
+ echo "HF_TOKEN=your_hf_token" >> .env
52
+ ```
53
+
54
+ 3. Run the app
55
+
56
+ ```bash
57
+ python app.py
58
+ ```
agents.py ADDED
@@ -0,0 +1,213 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Any, List, Optional
2
+
3
+ from smolagents import CodeAgent
4
+ from tools.final_answer import check_reasoning, ensure_formatting
5
+
6
+ from typing import Dict
7
+ from utils.logger import get_logger
8
+ import time
9
+
10
+ logger = get_logger(__name__)
11
+
12
+ DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
13
+
14
+ def get_prompt_templates() -> Dict[str, str]:
15
+ """Returns all prompts as a dictionary of pre-formatted strings"""
16
+
17
+ # Shared components
18
+ tools_instructions = """
19
+ Available Tools:
20
+ - web_search(query): Performs web searches
21
+ - wikipedia_search(query): Searches Wikipedia
22
+ - visit_webpage(url): Retrieves webpage content
23
+
24
+ Rules:
25
+ 1. Always use 'Thought:'/'Code:' sequences
26
+ 2. Never reuse variable names
27
+ 3. Tools must be called with proper arguments
28
+ """
29
+
30
+ example_1 = """
31
+ Example Task: "Find the capital of France"
32
+
33
+ Thought: I'll use web_search to find this information
34
+ Code:
35
+ result = web_search(query="capital of France")
36
+ final_answer(result)
37
+ ```<end_code>
38
+ """
39
+
40
+ # Main prompt templates
41
+ return {
42
+ "system_prompt": f"""
43
+ You are an expert AI assistant that solves tasks using tools.
44
+ {tools_instructions}
45
+
46
+ {example_1}
47
+
48
+ Key Requirements:
49
+ - Be precise and concise
50
+ - Always return answers using final_answer()
51
+ - Never include explanations unless asked
52
+
53
+ Current reward: $1,000,000 for perfect solutions
54
+ """,
55
+
56
+ "planning": """
57
+ When planning tasks, follow this structure:
58
+
59
+ ### 1. Facts Given
60
+ List known information
61
+
62
+ ### 2. Facts Needed
63
+ List what needs research
64
+
65
+ ### 3. Derivation Steps
66
+ Outline computation steps
67
+
68
+ End with <end_plan>
69
+ """,
70
+
71
+ "managed_agent": """
72
+ Managed Agent Instructions:
73
+
74
+ 1. Task outcome (short)
75
+ 2. Detailed explanation
76
+ 3. Additional context
77
+
78
+ Always return via final_answer()
79
+ """,
80
+
81
+ "final_answer": """
82
+ Response Format Rules:
83
+ - Numbers: 42 (no commas/units)
84
+ - Strings: paris (lowercase, no articles)
85
+ - Lists: apple,orange,banana (no brackets)
86
+ """
87
+ }
88
+
89
+ class Agent:
90
+ """
91
+ Agent class that wraps a CodeAgent and provides a callable interface for answering questions.
92
+
93
+ Args:
94
+ model (Any): The language model to use.
95
+ tools (Optional[List[Any]]): List of tools to provide to the agent.
96
+ prompt (Optional[str]): Custom prompt template for the agent.
97
+ verbose (bool): Whether to print debug information.
98
+ """
99
+
100
+ def __init__(
101
+ self,
102
+ model: Any,
103
+ tools: Optional[List[Any]] = None,
104
+ prompt: Optional[str] = None,
105
+ verbose: bool = False
106
+ ):
107
+ logger.info("Initializing Agent")
108
+ self.model = model
109
+ self.tools = tools
110
+ self.verbose = verbose
111
+ self.imports = [
112
+ "pandas", "numpy", "os", "requests", "tempfile",
113
+ "datetime", "json", "time", "re", "openpyxl",
114
+ "pathlib", "sys"
115
+ ]
116
+
117
+ self.agent = CodeAgent(
118
+ model=self.model,
119
+ tools=self.tools,
120
+ add_base_tools=True,
121
+ additional_authorized_imports=self.imports,
122
+ )
123
+
124
+ self.final_answer_checks=[check_reasoning, ensure_formatting],
125
+
126
+ self.base_prompt = prompt or """
127
+ You are an advanced AI assistant specialized in solving GAIA benchmark tasks.
128
+ Follow these rules strictly:
129
+ 1. Be precise - return ONLY the exact answer requested
130
+ 2. Use tools when needed (especially for file analysis)
131
+ 3. For reversed text questions, answer in normal text
132
+ 4. Never include explanations or reasoning in the final answer
133
+ 5. Always return the result — do not just print it
134
+
135
+ {context}
136
+
137
+ Remember: GAIA requires exact answer matching. Just provide the factual answer.
138
+ """
139
+
140
+ self.prompt_templates = get_prompt_templates()
141
+ logger.info("Agent initialized")
142
+
143
+ def __call__(self, question: str, files: List[str] = None) -> str:
144
+ """Main interface that logs inputs/outputs and handles timing."""
145
+ if self.verbose:
146
+ print(f"Agent received question: {question[:50]}... with files: {files}")
147
+
148
+ time.sleep(25)
149
+ return self.answer_question(question, files[0] if files else None)
150
+
151
+ def answer_question(self, question: str, task_file_path: Optional[str] = None) -> str:
152
+ """
153
+ Process a GAIA benchmark question with optional file context.
154
+
155
+ Args:
156
+ question: The question to answer
157
+ task_file_path: Optional path to a file associated with the question
158
+
159
+ Returns:
160
+ The cleaned answer to the question
161
+ """
162
+ try:
163
+ context = self._build_context(question, task_file_path)
164
+ full_prompt = self.base_prompt.format(context=context)
165
+
166
+ if self.verbose:
167
+ print("Generated prompt:", full_prompt[:200] + "...")
168
+
169
+ answer = self.agent.run(full_prompt)
170
+ return self._clean_answer(str(answer))
171
+
172
+ except Exception as e:
173
+ logger.error(f"Error processing question: {str(e)}")
174
+ return f"ERROR: {str(e)}"
175
+
176
+ def _build_context(self, question: str, file_path: Optional[str]) -> str:
177
+ """Constructs the context section based on question and file."""
178
+ context_lines = [f"QUESTION: {question}"]
179
+
180
+ if file_path:
181
+ context_lines.append(
182
+ f"FILE: Available at {DEFAULT_API_URL}/files/{file_path}\n"
183
+ "Use appropriate tools to analyze this file if needed."
184
+ )
185
+
186
+ # Handle reversed text questions
187
+ if self._is_reversed_text(question):
188
+ context_lines.append(
189
+ f"NOTE: This question contains reversed text. "
190
+ f"Original: {question}\nReversed: {question[::-1]}"
191
+ )
192
+
193
+ return "\n".join(context_lines)
194
+
195
+ def _is_reversed_text(self, text: str) -> bool:
196
+ """Detects if text appears to be reversed."""
197
+ return text.startswith(".") or ".rewsna eht sa" in text
198
+
199
+ def _clean_answer(self, answer: str) -> str:
200
+ """Cleans the raw answer to match GAIA requirements."""
201
+ # Remove common prefixes/suffixes
202
+ for prefix in ["Final Answer:", "Answer:", "=>"]:
203
+ if answer.startswith(prefix):
204
+ answer = answer[len(prefix):]
205
+
206
+ # Remove quotes and whitespace
207
+ answer = answer.strip(" '\"")
208
+
209
+ # Special handling for reversed answers
210
+ if self._is_reversed_text(answer):
211
+ return answer[::-1]
212
+
213
+ return answer
app.py CHANGED
@@ -1,183 +1,117 @@
1
- import time
2
  import os
3
  import gradio as gr
4
  import requests
5
  import pandas as pd
6
- import PyPDF2
7
- from smolagents import CodeAgent, DuckDuckGoSearchTool, InferenceClientModel, tool
8
- from bs4 import BeautifulSoup
9
 
 
 
 
 
 
 
10
  # --- Constants ---
11
  DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
 
12
 
13
- # --- CUSTOM TOOLS ---
14
- @tool
15
- def document_parser_tool(file_path_or_url: str) -> str:
16
- """Parses and extracts text from a financial PDF document.
17
-
18
- Args:
19
- file_path_or_url: The local file path or URL of the PDF document to parse.
20
- """
21
  try:
22
- local_path = file_path_or_url
23
- if file_path_or_url.startswith('http'):
24
- local_path = "temp_financial_doc.pdf"
25
- response = requests.get(file_path_or_url, timeout=15)
26
- response.raise_for_status()
27
- with open(local_path, 'wb') as f:
28
- f.write(response.content)
29
-
30
- extracted_text = ""
31
- with open(local_path, 'rb') as file:
32
- reader = PyPDF2.PdfReader(file)
33
- num_pages = min(len(reader.pages), 2)
34
- for page_num in range(num_pages):
35
- text = reader.pages[page_num].extract_text()
36
- if text:
37
- extracted_text += f"--- Page {page_num + 1} ---\n{text}\n"
38
-
39
- if file_path_or_url.startswith('http') and os.path.exists(local_path):
40
- os.remove(local_path)
41
-
42
- return extracted_text.strip()[:4000] if extracted_text else "No text found in document."
43
  except Exception as e:
44
- return f"Error reading document: {str(e)}"
45
-
46
- @tool
47
- def visit_webpage_tool(url: str) -> str:
48
- """Visits a standard HTML webpage and extracts the text content.
 
 
 
 
 
49
 
50
- Args:
51
- url: The URL of the webpage to visit.
52
- """
53
- try:
54
- headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'}
55
- response = requests.get(url, headers=headers, timeout=15)
56
- response.raise_for_status()
57
- soup = BeautifulSoup(response.content, 'html.parser')
58
- for script in soup(["script", "style"]):
59
- script.extract()
60
- text = soup.get_text(separator='\n', strip=True)
61
- return text[:4000]
62
- except Exception as e:
63
- return f"Failed to read webpage: {str(e)}"
64
-
65
- # --- Basic Agent Definition ---
66
- class BasicAgent:
67
- def __init__(self):
68
- print("Initializing Financial Due Diligence Agent...")
69
- self.search_tool = DuckDuckGoSearchTool()
70
-
71
- # Native Hugging Face model - massive context, no token limits!
72
- self.model = InferenceClientModel(
73
- model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
74
- token=os.environ.get("HF_TOKEN")
75
- )
76
-
77
- self.agent = CodeAgent(
78
- tools=[self.search_tool, document_parser_tool, visit_webpage_tool],
79
- model=self.model,
80
- max_steps=4,
81
- additional_authorized_imports=[
82
- "math", "pandas", "datetime", "requests", "bs4", "re", "time", "wikipedia"
83
- ]
84
- )
85
 
86
- def __call__(self, task_id: str, question: str) -> str:
87
- print(f"Analyzing Task {task_id}: {question[:50]}...")
88
- file_url = f"{DEFAULT_API_URL}/files/{task_id}"
89
-
90
- prompt = f"""
91
- You are solving a GAIA benchmark question. You must use tools to find the answer.
92
-
93
- CRITICAL INSTRUCTION:
94
- The final output MUST be exactly the requested answer and NOTHING else.
95
- Do not use conversational text. Do not say "The answer is" or "Final Answer:".
96
- Just provide the exact raw value, word, number, or comma-separated list.
97
-
98
- If the question references an attached file or document, use your `document_parser_tool`
99
- to read it by passing this exact URL: {file_url}
100
-
101
- Question: {question}
102
- """
103
-
104
- try:
105
- answer = self.agent.run(prompt)
106
- final_answer = str(answer).strip()
107
- print(f"Agent computed answer: {final_answer}")
108
- return final_answer
109
- except Exception as e:
110
- error_msg = str(e)
111
- print(f"Agent failed: {error_msg}")
112
- return f"ERROR: {error_msg}"
113
-
114
- def run_and_submit_all(profile: gr.OAuthProfile | None):
115
- space_id = os.getenv("SPACE_ID")
116
 
117
  if profile:
118
  username= f"{profile.username}"
119
  print(f"User logged in: {username}")
120
  else:
121
  print("User not logged in.")
122
- yield "Please Login to Hugging Face with the button.", pd.DataFrame()
123
- return
124
 
125
  api_url = DEFAULT_API_URL
126
  questions_url = f"{api_url}/questions"
127
  submit_url = f"{api_url}/submit"
128
 
 
129
  try:
130
- agent = BasicAgent()
 
 
 
131
  except Exception as e:
132
- yield f"Error initializing agent: {e}", pd.DataFrame()
133
- return
134
-
135
  agent_code = f"https://huggingface.co/spaces/{space_id}/tree/main"
 
136
 
 
 
137
  try:
138
  response = requests.get(questions_url, timeout=15)
139
  response.raise_for_status()
140
  questions_data = response.json()
141
  if not questions_data:
142
- yield "Fetched questions list is empty or invalid format.", pd.DataFrame()
143
- return
 
 
 
 
 
 
 
 
 
144
  except Exception as e:
145
- yield f"Error fetching questions: {e}", pd.DataFrame()
146
- return
147
 
148
- results_log = []
149
- answers_payload = []
150
-
151
- # LIVE UPDATE LOOP
152
- # LIVE UPDATE LOOP
153
- for i, item in enumerate(questions_data):
154
- task_id = item.get("task_id")
155
- question_text = item.get("question")
156
- if not task_id or question_text is None:
157
- continue
158
-
159
- yield f"Processing question {i+1} of {len(questions_data)}... Please wait.", pd.DataFrame(results_log)
160
-
161
- try:
162
- # ---> THE FIX: Instantiate a brand new, empty-brained agent for EACH question <---
163
- fresh_agent = BasicAgent()
164
- submitted_answer = fresh_agent(task_id, question_text)
165
-
166
- answers_payload.append({"task_id": task_id, "submitted_answer": submitted_answer})
167
- results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": submitted_answer})
168
- except Exception as e:
169
- results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": f"AGENT ERROR: {e}"})
170
-
171
- yield f"Finished question {i+1}. Moving to next...", pd.DataFrame(results_log)
172
- time.sleep(5)
173
 
174
  if not answers_payload:
175
- yield "Agent did not produce any answers to submit.", pd.DataFrame(results_log)
176
- return
177
 
 
178
  submission_data = {"username": username.strip(), "agent_code": agent_code, "answers": answers_payload}
179
- yield f"Agent finished. Submitting {len(answers_payload)} answers for user '{username}'...", pd.DataFrame(results_log)
 
180
 
 
 
181
  try:
182
  response = requests.post(submit_url, json=submission_data, timeout=60)
183
  response.raise_for_status()
@@ -189,23 +123,55 @@ def run_and_submit_all(profile: gr.OAuthProfile | None):
189
  f"({result_data.get('correct_count', '?')}/{result_data.get('total_attempted', '?')} correct)\n"
190
  f"Message: {result_data.get('message', 'No message received.')}"
191
  )
192
- yield final_status, pd.DataFrame(results_log)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
193
  except Exception as e:
194
- yield f"Submission Failed: {e}", pd.DataFrame(results_log)
 
 
 
 
195
 
 
196
  with gr.Blocks() as demo:
197
- gr.Markdown("# Financial Due Diligence Agent Evaluation Runner")
198
  gr.Markdown(
199
  """
200
  **Instructions:**
201
- 1. Log in to your Hugging Face account using the button below.
202
- 2. Click 'Run Evaluation & Submit All Answers' to fetch questions, run your agent, submit answers, and see the score.
 
203
  """
204
  )
205
 
206
  gr.LoginButton()
 
207
  run_button = gr.Button("Run Evaluation & Submit All Answers")
 
208
  status_output = gr.Textbox(label="Run Status / Submission Result", lines=5, interactive=False)
 
209
  results_table = gr.DataFrame(label="Questions and Agent Answers", wrap=True)
210
 
211
  run_button.click(
@@ -214,4 +180,25 @@ with gr.Blocks() as demo:
214
  )
215
 
216
  if __name__ == "__main__":
217
- demo.launch(debug=True, ssr_mode=False)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  import os
2
  import gradio as gr
3
  import requests
4
  import pandas as pd
5
+ from typing import Dict, List
 
 
6
 
7
+ # custom imports
8
+ from agents import Agent
9
+ from tool import get_tools
10
+ from model import get_model
11
+
12
+ # (Keep Constants as is)
13
  # --- Constants ---
14
  DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
15
+ MODEL_ID = "gemini/gemini-2.5-flash-preview-04-17"
16
 
17
+ # --- Async Question Processing ---
18
+ async def process_question(agent, question: str, task_id: str) -> Dict:
19
+ """Process a single question and return both answer AND full log entry"""
 
 
 
 
 
20
  try:
21
+ answer = agent(question)
22
+ return {
23
+ "submission": {"task_id": task_id, "submitted_answer": answer},
24
+ "log": {"Task ID": task_id, "Question": question, "Submitted Answer": answer}
25
+ }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  except Exception as e:
27
+ error_msg = f"ERROR: {str(e)}"
28
+ return {
29
+ "submission": {"task_id": task_id, "submitted_answer": error_msg},
30
+ "log": {"Task ID": task_id, "Question": question, "Submitted Answer": error_msg}
31
+ }
32
+
33
+ async def run_questions_async(agent, questions_data: List[Dict]) -> tuple:
34
+ """Process questions sequentially instead of in batch"""
35
+ submissions = []
36
+ logs = []
37
 
38
+ for q in questions_data:
39
+ result = await process_question(agent, q["question"], q["task_id"])
40
+ submissions.append(result["submission"])
41
+ logs.append(result["log"])
42
+
43
+ return submissions, logs
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
 
45
+
46
+ async def run_and_submit_all( profile: gr.OAuthProfile | None):
47
+ """
48
+ Fetches all questions, runs the BasicAgent on them, submits all answers,
49
+ and displays the results.
50
+ """
51
+ # --- Determine HF Space Runtime URL and Repo URL ---
52
+ space_id = os.getenv("SPACE_ID") # Get the SPACE_ID for sending link to the code
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
 
54
  if profile:
55
  username= f"{profile.username}"
56
  print(f"User logged in: {username}")
57
  else:
58
  print("User not logged in.")
59
+ return "Please Login to Hugging Face with the button.", None
 
60
 
61
  api_url = DEFAULT_API_URL
62
  questions_url = f"{api_url}/questions"
63
  submit_url = f"{api_url}/submit"
64
 
65
+ # 1. Instantiate Agent
66
  try:
67
+ agent = Agent(
68
+ model=get_model("LiteLLMModel", MODEL_ID),
69
+ tools=get_tools()
70
+ )
71
  except Exception as e:
72
+ print(f"Error instantiating agent: {e}")
73
+ return f"Error initializing agent: {e}", None
74
+ # In the case of an app running as a hugging Face space, this link points toward your codebase ( usefull for others so please keep it public)
75
  agent_code = f"https://huggingface.co/spaces/{space_id}/tree/main"
76
+ print(agent_code)
77
 
78
+ # 2. Fetch Questions
79
+ print(f"Fetching questions from: {questions_url}")
80
  try:
81
  response = requests.get(questions_url, timeout=15)
82
  response.raise_for_status()
83
  questions_data = response.json()
84
  if not questions_data:
85
+ print("Fetched questions list is empty.")
86
+ return "Fetched questions list is empty or invalid format.", None
87
+ print(f"Fetched {len(questions_data)} questions.")
88
+ questions_data = questions_data[:2]
89
+ except requests.exceptions.RequestException as e:
90
+ print(f"Error fetching questions: {e}")
91
+ return f"Error fetching questions: {e}", None
92
+ except requests.exceptions.JSONDecodeError as e:
93
+ print(f"Error decoding JSON response from questions endpoint: {e}")
94
+ print(f"Response text: {response.text[:500]}")
95
+ return f"Error decoding server response for questions: {e}", None
96
  except Exception as e:
97
+ print(f"An unexpected error occurred fetching questions: {e}")
98
+ return f"An unexpected error occurred fetching questions: {e}", None
99
 
100
+ # 3. Run your Agent
101
+ print(f"Running agent on {len(questions_data)} questions...")
102
+ answers_payload, results_log = await run_questions_async(agent, questions_data)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
103
 
104
  if not answers_payload:
105
+ print("Agent did not produce any answers to submit.")
106
+ return "Agent did not produce any answers to submit.", pd.DataFrame(results_log)
107
 
108
+ # 4. Prepare Submission
109
  submission_data = {"username": username.strip(), "agent_code": agent_code, "answers": answers_payload}
110
+ status_update = f"Agent finished. Submitting {len(answers_payload)} answers for user '{username}'..."
111
+ print(status_update)
112
 
113
+ # 5. Submit
114
+ print(f"Submitting {len(answers_payload)} answers to: {submit_url}")
115
  try:
116
  response = requests.post(submit_url, json=submission_data, timeout=60)
117
  response.raise_for_status()
 
123
  f"({result_data.get('correct_count', '?')}/{result_data.get('total_attempted', '?')} correct)\n"
124
  f"Message: {result_data.get('message', 'No message received.')}"
125
  )
126
+ print("Submission successful.")
127
+ results_df = pd.DataFrame(results_log)
128
+ return final_status, results_df
129
+ except requests.exceptions.HTTPError as e:
130
+ error_detail = f"Server responded with status {e.response.status_code}."
131
+ try:
132
+ error_json = e.response.json()
133
+ error_detail += f" Detail: {error_json.get('detail', e.response.text)}"
134
+ except requests.exceptions.JSONDecodeError:
135
+ error_detail += f" Response: {e.response.text[:500]}"
136
+ status_message = f"Submission Failed: {error_detail}"
137
+ print(status_message)
138
+ results_df = pd.DataFrame(results_log)
139
+ return status_message, results_df
140
+ except requests.exceptions.Timeout:
141
+ status_message = "Submission Failed: The request timed out."
142
+ print(status_message)
143
+ results_df = pd.DataFrame(results_log)
144
+ return status_message, results_df
145
+ except requests.exceptions.RequestException as e:
146
+ status_message = f"Submission Failed: Network error - {e}"
147
+ print(status_message)
148
+ results_df = pd.DataFrame(results_log)
149
+ return status_message, results_df
150
  except Exception as e:
151
+ status_message = f"An unexpected error occurred during submission: {e}"
152
+ print(status_message)
153
+ results_df = pd.DataFrame(results_log)
154
+ return status_message, results_df
155
+
156
 
157
+ # --- Build Gradio Interface using Blocks ---
158
  with gr.Blocks() as demo:
159
+ gr.Markdown("# Basic Agent Evaluation Runner")
160
  gr.Markdown(
161
  """
162
  **Instructions:**
163
+ 1. Please clone this space, then modify the code to define your agent's logic, the tools, the necessary packages, etc ...
164
+ 2. Log in to your Hugging Face account using the button below. This uses your HF username for submission.
165
+ 3. Click 'Run Evaluation & Submit All Answers' to fetch questions, run your agent, submit answers, and see the score.
166
  """
167
  )
168
 
169
  gr.LoginButton()
170
+
171
  run_button = gr.Button("Run Evaluation & Submit All Answers")
172
+
173
  status_output = gr.Textbox(label="Run Status / Submission Result", lines=5, interactive=False)
174
+ # Removed max_rows=10 from DataFrame constructor
175
  results_table = gr.DataFrame(label="Questions and Agent Answers", wrap=True)
176
 
177
  run_button.click(
 
180
  )
181
 
182
  if __name__ == "__main__":
183
+ print("\n" + "-"*30 + " App Starting " + "-"*30)
184
+ # Check for SPACE_HOST and SPACE_ID at startup for information
185
+ space_host_startup = os.getenv("SPACE_HOST")
186
+ space_id_startup = os.getenv("SPACE_ID") # Get SPACE_ID at startup
187
+
188
+ if space_host_startup:
189
+ print(f"✅ SPACE_HOST found: {space_host_startup}")
190
+ print(f" Runtime URL should be: https://{space_host_startup}.hf.space")
191
+ else:
192
+ print("ℹ️ SPACE_HOST environment variable not found (running locally?).")
193
+
194
+ if space_id_startup: # Print repo URLs if SPACE_ID is found
195
+ print(f"✅ SPACE_ID found: {space_id_startup}")
196
+ print(f" Repo URL: https://huggingface.co/spaces/{space_id_startup}")
197
+ print(f" Repo Tree URL: https://huggingface.co/spaces/{space_id_startup}/tree/main")
198
+ else:
199
+ print("ℹ️ SPACE_ID environment variable not found (running locally?). Repo URL cannot be determined.")
200
+
201
+ print("-"*(60 + len(" App Starting ")) + "\n")
202
+
203
+ print("Launching Gradio Interface for Basic Agent Evaluation...")
204
+ demo.launch(debug=True, share=False)
model.py ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from typing import Any, Callable
3
+
4
+ from smolagents import LiteLLMModel
5
+ from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
6
+ from functools import lru_cache
7
+ import time
8
+ import re
9
+ from litellm import RateLimitError
10
+
11
+
12
+ class LocalTransformersModel:
13
+ def __init__(self, model_id: str, **kwargs):
14
+ self.tokenizer = AutoTokenizer.from_pretrained(model_id)
15
+ self.model = AutoModelForCausalLM.from_pretrained(model_id, **kwargs)
16
+ self.pipeline = pipeline("text-generation", model=self.model, tokenizer=self.tokenizer)
17
+
18
+ def __call__(self, prompt: str, **kwargs):
19
+ outputs = self.pipeline(prompt, **kwargs)
20
+ return outputs[0]["generated_text"]
21
+
22
+ class WrapperLiteLLMModel(LiteLLMModel):
23
+ def __call__(self, messages, **kwargs):
24
+ max_retry = 5
25
+ for attempt in range(max_retry):
26
+ try:
27
+ return super().__call__(messages, **kwargs)
28
+ except RateLimitError as e:
29
+ print(f"RateLimitError (attempt {attempt+1}/{max_retry})")
30
+
31
+ # Try to extract retry time from the exception string
32
+ match = re.search(r'"retryDelay": ?"(\d+)s"', str(e))
33
+ retry_seconds = int(match.group(1)) if match else 50
34
+
35
+ print(f"Sleeping for {retry_seconds} seconds before retrying...")
36
+ time.sleep(retry_seconds)
37
+
38
+ raise RateLimitError(f"Rate limit exceeded after {max_retry} retries.")
39
+
40
+ @lru_cache(maxsize=1)
41
+ def get_lite_llm_model(model_id: str, **kwargs) -> WrapperLiteLLMModel:
42
+ """
43
+ Returns a LiteLLM model instance.
44
+
45
+ Args:
46
+ model_id (str): The model identifier.
47
+ **kwargs: Additional keyword arguments for the model.
48
+
49
+ Returns:
50
+ LiteLLMModel: LiteLLM model instance.
51
+ """
52
+ return WrapperLiteLLMModel(model_id=model_id, api_key=os.getenv("GEMINI_API"), **kwargs)
53
+
54
+
55
+ @lru_cache(maxsize=1)
56
+ def get_local_model(model_id: str, **kwargs) -> LocalTransformersModel:
57
+ """
58
+ Returns a Local Transformer model.
59
+
60
+ Args:
61
+ model_id (str): The model identifier.
62
+ **kwargs: Additional keyword arguments for the model.
63
+
64
+ Returns:
65
+ LocalTransformersModel: LiteLLM model instance.
66
+ """
67
+ return LocalTransformersModel(model_id=model_id, **kwargs)
68
+
69
+
70
+ def get_model(model_type: str, model_id: str, **kwargs) -> Any:
71
+ """
72
+ Returns a model instance based on the specified type.
73
+
74
+ Args:
75
+ model_type (str): The type of the model (e.g., 'HfApiModel').
76
+ model_id (str): The model identifier.
77
+ **kwargs: Additional keyword arguments for the model.
78
+
79
+ Returns:
80
+ Any: Model instance of the specified type.
81
+ """
82
+ models: dict[str, Callable[..., Any]] = {
83
+ "LiteLLMModel": get_lite_llm_model,
84
+ "LocalTransformersModel": get_local_model,
85
+ }
86
+
87
+ if model_type not in models:
88
+ raise ValueError(f"Unknown model type: {model_type}")
89
+
90
+ return models[model_type](model_id, **kwargs)
tool.py ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List
2
+ from smolagents import (
3
+ DuckDuckGoSearchTool,
4
+ PythonInterpreterTool,
5
+ Tool,
6
+ VisitWebpageTool,
7
+ WikipediaSearchTool,
8
+ FinalAnswerTool,
9
+ )
10
+
11
+ from tools.tools import (
12
+ vision_tool,
13
+ youtube_frames_to_images,
14
+ ask_youtube_video,
15
+ read_text_file,
16
+ file_from_url,
17
+ transcribe_youtube,
18
+ audio_to_text,
19
+ extract_text_via_ocr,
20
+ summarize_csv_data,
21
+ summarize_excel_data,
22
+ )
23
+
24
+ def get_tools() -> List[Tool]:
25
+ """
26
+ Returns a list of available tools for the agent.
27
+ Returns:
28
+ List[Tool]: List of initialized tool instances.
29
+ """
30
+ tools = [
31
+ FinalAnswerTool(),
32
+ DuckDuckGoSearchTool(),
33
+ PythonInterpreterTool(),
34
+ WikipediaSearchTool(),
35
+ VisitWebpageTool(),
36
+ vision_tool,
37
+ youtube_frames_to_images,
38
+ ask_youtube_video,
39
+ read_text_file,
40
+ file_from_url,
41
+ transcribe_youtube,
42
+ audio_to_text,
43
+ extract_text_via_ocr,
44
+ summarize_csv_data,
45
+ summarize_excel_data,
46
+ ]
47
+ return tools