computer-agent

Running on CPU Upgrade

App Files Files Community

m-ric HF Staff commited on Mar 28

Commit

39ad88a

1 Parent(s): c84095d

Transfer task details to system prompt

Browse files

Files changed (2) hide show

app.py +1 -17
e2bqwen.py +8 -3

app.py CHANGED Viewed

@@ -497,27 +497,11 @@ class EnrichedGradioUI(GradioUI):
         else:
             session_state["agent"] = create_agent(data_dir=data_dir, desktop=desktop)
-        # Construct the full task with instructions
-        full_task = task_input + dedent(f"""
-            The desktop has a resolution of {WIDTH}x{HEIGHT}, take it into account to decide clicking coordinates.
-            When clicking an element, always make sure to click THE MIDDLE of that element! Else you risk to miss it.
-            Always analyze the latest screenshot carefully before performing actions. Make sure to:
-            1. Look at elements on the screen to determine what to click or interact with
-            2. Use precise coordinates for mouse movements and clicks
-            3. Wait for page loads or animations to complete using the wait() tool
-            4. Sometimes you may have missed a click, so never assume that you're on the right page, always make sure that your previous action worked. In the screenshot you can see if the mouse is out of the clickable area. Pay special attention to this.
-            When you receive a task, break it down into step-by-step actions. On each step, look at the current screenshot to validate if previous steps worked and decide the next action.
-            We can only execute one action at a time. On each step, answer only a python blob with the action to perform
-        """)
         try:
             stored_messages.append(gr.ChatMessage(role="user", content=task_input))
             yield stored_messages
-            for msg in stream_to_gradio(session_state["agent"], task=full_task, reset_agent_memory=False):
                 if hasattr(session_state["agent"], "last_screenshot") and msg.content == "-----": # Append the last screenshot before the end of step
                     stored_messages.append(gr.ChatMessage(
                         role="assistant",

         else:
             session_state["agent"] = create_agent(data_dir=data_dir, desktop=desktop)
         try:
             stored_messages.append(gr.ChatMessage(role="user", content=task_input))
             yield stored_messages
+            for msg in stream_to_gradio(session_state["agent"], task=task_input, reset_agent_memory=False):
                 if hasattr(session_state["agent"], "last_screenshot") and msg.content == "-----": # Append the last screenshot before the end of step
                     stored_messages.append(gr.ChatMessage(
                         role="assistant",

e2bqwen.py CHANGED Viewed

@@ -29,7 +29,7 @@ On top of performing computations in the Python code snippets that you create, y
     Returns an output of type: {{tool.output_type}}
 {%- endfor %}
-The desktop has a resolution of <<resolution_x>>x<<resolution_y>>.
 IMPORTANT:
 - Remember the tools that you have as those can save you time, for example open_url to enter a website rather than searching for the browser in the OS.
@@ -84,9 +84,14 @@ Remember to:
 Always wait for appropriate loading times
 Use precise coordinates based on the current screenshot
 Execute one action at a time
-Verify the result before proceeding to the next step. If you repeated an action already without effect, it means that this action is useless: don't repeat it and try something else.
 Use click to move through menus on the desktop and scroll for web and specific applications.
-REMEMBER TO ALWAYS CLICK IN THE MIDDLE OF THE TEXT, NOT ON THE SIDE, NOT UNDER.
 """
 def draw_marker_on_image(image, click_coordinates):

     Returns an output of type: {{tool.output_type}}
 {%- endfor %}
+The desktop has a resolution of <<resolution_x>>x<<resolution_y>>, take it into account to decide clicking coordinates.
 IMPORTANT:
 - Remember the tools that you have as those can save you time, for example open_url to enter a website rather than searching for the browser in the OS.
 Always wait for appropriate loading times
 Use precise coordinates based on the current screenshot
 Execute one action at a time
+On each step, look at the last screenshot and action to validate if previous steps worked and decide the next action. If you repeated an action already without effect, it means that this action is useless: don't repeat it and try something else.
 Use click to move through menus on the desktop and scroll for web and specific applications.
+When clicking an element, always make sure to click THE MIDDLE of that element! Else you risk to miss it.
+Always analyze the latest screenshot carefully before performing actions. Make sure to:
+1. Look at elements on the screen to determine what to click or interact with
+2. Use precise coordinates for mouse movements and clicks
+3. You can wait for page loads or animations to complete using the wait() tool
+4. Sometimes you may have missed a click, so never assume that you're on the right page, always make sure that your previous action worked. In the screenshot you can see if the mouse is out of the clickable area. Pay special attention to this.
 """
 def draw_marker_on_image(image, click_coordinates):