Spaces:

DeepJudge
/

Applicant-Task-Submission

Running

App Files Files

Timothy-Vinzent commited on Feb 20

Commit

5771b1d

verified ·

1 Parent(s): 33a1d1f

Update app.py

Browse files

Files changed (1) hide show

app.py +26 -11

app.py CHANGED Viewed

@@ -224,10 +224,11 @@ def build_interface():
     Constructs the Gradio interface with a submission button and single-submission mechanism.
     """
     with gr.Blocks() as demo:
-        gr.Markdown("# GPT-4o Mini System Prompt Submission")
         # General description
-        gr.Markdown("""Classification Task: Document and Clause Level Identification
-        Participants must create a system prompt for a language model that classifies user queries about legal documents into two specific categories:
         1. **Document Level**: Determines whether the query refers to a single document or multiple documents.
         2. **Clause Level**: Identifies whether the query is focused on:
             - A single clause,
@@ -243,11 +244,11 @@ def build_interface():
         }
         ```
-        The goal is to ensure that the model's output is concise, structured, and accurate. This task is designed to evaluate the robustness of the system prompt in handling classification tasks with short, precise outputs.
         """)
         # Example Inputs and Outputs in an Accordion
-        with gr.Accordion("Example Inputs and Expected Outputs", open=False):
             gr.Markdown("""
             1. **User Message Example 1:**
             - *"Please provide the contract for the lease agreement."*
@@ -300,12 +301,14 @@ def build_interface():
             """)
         # Challenge instructions in another Accordion
-        with gr.Accordion("Challenge Instructions", open=False):
             gr.Markdown("""
-            - Design a system prompt that ensures the AI generates outputs like those above when given similar user messages.
               The system prompt should:
-              1. Specify formatting requirements (e.g., *"Output must be a valid JSON object"*). Note that we are not using constrained decoding or any sort of JSON mode; if not correctly prompted, the LLM will output plain text.
               2. Emphasize strict adherence to classification definitions:
                   - *Single Document:* Refers to one document.
                   - *Multiple Documents:* Refers to more than one document.
@@ -313,12 +316,24 @@ def build_interface():
                   - *Multiple Clauses:* Refers to more than one specific clause.
                   - *General Information:* Refers to general content not tied to specific clauses.
-              You can only submit once, so test your system prompt thoroughly before submission!
               """)
         gr.Markdown(
-            "Please enter your details and submit your system prompt below. "
-            "You can only submit once, I suggest trying to test and build out the system prompt using the same LM being used here elsewhere before submitting."
         )
         email_input = gr.Textbox(label="Email", placeholder="your.email@example.com")

     Constructs the Gradio interface with a submission button and single-submission mechanism.
     """
     with gr.Blocks() as demo:
+        gr.Markdown("# System Prompt Applicant Task")
+        gr.Markdown("## Document and Clause Level Classification")
         # General description
+        gr.Markdown("""
+        Applicants must create a system prompt for a language model that classifies user queries about legal documents into two specific categories:
         1. **Document Level**: Determines whether the query refers to a single document or multiple documents.
         2. **Clause Level**: Identifies whether the query is focused on:
             - A single clause,
         }
         ```
+        The goal is to ensure that the model's output adheres to the precscibed JSON structure and accurately classifies 7 test queries into the two respective categories. This task is designed to evaluate your prompting, by adhering to the required structure without any constrained decoding or "JSON mode" while providing correct responses at the same time.
         """)
         # Example Inputs and Outputs in an Accordion
+        with gr.Accordion("**Example Inputs and Expected Outputs**", open=False):
             gr.Markdown("""
             1. **User Message Example 1:**
             - *"Please provide the contract for the lease agreement."*
             """)
         # Challenge instructions in another Accordion
+        with gr.Accordion("**Challenge Instructions**", open=False):
             gr.Markdown("""
+            - Design a system prompt that ensures gpt4o-mini generates outputs like those above when given similar user messages.
               The system prompt should:
+              1. Specify formatting requirements (e.g., *"Output must be a valid JSON object"*).
+                  - Note that we are not using constrained decoding or any sort of JSON mode; if not correctly prompted, the LLM will output plain text.
+                  - All LLM responses will be passed to json.loads(response), responses that fail the json parsing are deemed incorrect (beware of tripple backtricks etc.)
               2. Emphasize strict adherence to classification definitions:
                   - *Single Document:* Refers to one document.
                   - *Multiple Documents:* Refers to more than one document.
                   - *Multiple Clauses:* Refers to more than one specific clause.
                   - *General Information:* Refers to general content not tied to specific clauses.
+            **You can only submit once, so test your system prompt thoroughly before submission!**
+            You will be scored according to the following criteria with respect to the outputs of 7 test user messages
+                - Response is valid JSON
+                - The response contains the keys: "document_level" and "clause_level"
+                - The values for each of the keys are correct
+            Good Luck!
               """)
         gr.Markdown(
+            """Please enter the same name and email as listed in your CV and submit your system prompt below.
+            You can only submit once, try to test and build out your system prompt using gpt4o-mini with temp=1 before submitting your solution.
+            We look forward to your submission!
+            """
         )
         email_input = gr.Textbox(label="Email", placeholder="your.email@example.com")