m-ric HF Staff commited on
Commit
39ad88a
·
1 Parent(s): c84095d

Transfer task details to system prompt

Browse files
Files changed (2) hide show
  1. app.py +1 -17
  2. e2bqwen.py +8 -3
app.py CHANGED
@@ -497,27 +497,11 @@ class EnrichedGradioUI(GradioUI):
497
  else:
498
  session_state["agent"] = create_agent(data_dir=data_dir, desktop=desktop)
499
 
500
-
501
- # Construct the full task with instructions
502
- full_task = task_input + dedent(f"""
503
- The desktop has a resolution of {WIDTH}x{HEIGHT}, take it into account to decide clicking coordinates.
504
- When clicking an element, always make sure to click THE MIDDLE of that element! Else you risk to miss it.
505
-
506
- Always analyze the latest screenshot carefully before performing actions. Make sure to:
507
- 1. Look at elements on the screen to determine what to click or interact with
508
- 2. Use precise coordinates for mouse movements and clicks
509
- 3. Wait for page loads or animations to complete using the wait() tool
510
- 4. Sometimes you may have missed a click, so never assume that you're on the right page, always make sure that your previous action worked. In the screenshot you can see if the mouse is out of the clickable area. Pay special attention to this.
511
-
512
- When you receive a task, break it down into step-by-step actions. On each step, look at the current screenshot to validate if previous steps worked and decide the next action.
513
- We can only execute one action at a time. On each step, answer only a python blob with the action to perform
514
- """)
515
-
516
  try:
517
  stored_messages.append(gr.ChatMessage(role="user", content=task_input))
518
  yield stored_messages
519
 
520
- for msg in stream_to_gradio(session_state["agent"], task=full_task, reset_agent_memory=False):
521
  if hasattr(session_state["agent"], "last_screenshot") and msg.content == "-----": # Append the last screenshot before the end of step
522
  stored_messages.append(gr.ChatMessage(
523
  role="assistant",
 
497
  else:
498
  session_state["agent"] = create_agent(data_dir=data_dir, desktop=desktop)
499
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
500
  try:
501
  stored_messages.append(gr.ChatMessage(role="user", content=task_input))
502
  yield stored_messages
503
 
504
+ for msg in stream_to_gradio(session_state["agent"], task=task_input, reset_agent_memory=False):
505
  if hasattr(session_state["agent"], "last_screenshot") and msg.content == "-----": # Append the last screenshot before the end of step
506
  stored_messages.append(gr.ChatMessage(
507
  role="assistant",
e2bqwen.py CHANGED
@@ -29,7 +29,7 @@ On top of performing computations in the Python code snippets that you create, y
29
  Returns an output of type: {{tool.output_type}}
30
  {%- endfor %}
31
 
32
- The desktop has a resolution of <<resolution_x>>x<<resolution_y>>.
33
 
34
  IMPORTANT:
35
  - Remember the tools that you have as those can save you time, for example open_url to enter a website rather than searching for the browser in the OS.
@@ -84,9 +84,14 @@ Remember to:
84
  Always wait for appropriate loading times
85
  Use precise coordinates based on the current screenshot
86
  Execute one action at a time
87
- Verify the result before proceeding to the next step. If you repeated an action already without effect, it means that this action is useless: don't repeat it and try something else.
88
  Use click to move through menus on the desktop and scroll for web and specific applications.
89
- REMEMBER TO ALWAYS CLICK IN THE MIDDLE OF THE TEXT, NOT ON THE SIDE, NOT UNDER.
 
 
 
 
 
90
  """
91
 
92
  def draw_marker_on_image(image, click_coordinates):
 
29
  Returns an output of type: {{tool.output_type}}
30
  {%- endfor %}
31
 
32
+ The desktop has a resolution of <<resolution_x>>x<<resolution_y>>, take it into account to decide clicking coordinates.
33
 
34
  IMPORTANT:
35
  - Remember the tools that you have as those can save you time, for example open_url to enter a website rather than searching for the browser in the OS.
 
84
  Always wait for appropriate loading times
85
  Use precise coordinates based on the current screenshot
86
  Execute one action at a time
87
+ On each step, look at the last screenshot and action to validate if previous steps worked and decide the next action. If you repeated an action already without effect, it means that this action is useless: don't repeat it and try something else.
88
  Use click to move through menus on the desktop and scroll for web and specific applications.
89
+ When clicking an element, always make sure to click THE MIDDLE of that element! Else you risk to miss it.
90
+ Always analyze the latest screenshot carefully before performing actions. Make sure to:
91
+ 1. Look at elements on the screen to determine what to click or interact with
92
+ 2. Use precise coordinates for mouse movements and clicks
93
+ 3. You can wait for page loads or animations to complete using the wait() tool
94
+ 4. Sometimes you may have missed a click, so never assume that you're on the right page, always make sure that your previous action worked. In the screenshot you can see if the mouse is out of the clickable area. Pay special attention to this.
95
  """
96
 
97
  def draw_marker_on_image(image, click_coordinates):