kaikaidai commited on
Commit
58f5f61
1 Parent(s): dcdb545

UI changes 11 Nov

Browse files
Files changed (1) hide show
  1. common.py +18 -25
common.py CHANGED
@@ -1,17 +1,19 @@
1
  # Page Headers
2
- MAIN_TITLE = "# Judge Arena - Free LLM Evals to test your GenAI application"
3
 
4
  # How it works section
5
  HOW_IT_WORKS = """
6
- - **Run any form of evaluation:** from simple hallucination detection to qualitative interpretations
7
- - **Evaluate anything:** coding, analysis, creative writing, math, or general knowledge
8
  """
9
 
10
  BATTLE_RULES = """
11
- ## 🤺 Battle Rules:
12
- - Both AIs stay anonymous - if either reveals its identity, the duel is void
13
- - Choose the LLM judge that most aligns with your judgement
14
- - If both score the same - choose the critique that you prefer more!
 
 
 
15
  <br>
16
  """
17
 
@@ -35,34 +37,25 @@ CSS_STYLES = """
35
  gap: 8px;
36
  }
37
  """
38
-
39
  # Default Eval Prompt
40
  EVAL_DESCRIPTION = """
41
- ## 📝 Instructions
42
- **Precise evaluation criteria lead to more consistent and reliable judgments.** A good Evaluator Prompt should include the following elements:
43
  - Evaluation criteria
44
  - Scoring rubric
45
- - (Optional) Examples\n
46
-
47
- **Any variables you define in your prompt using {{double curly braces}} will automatically map to the corresponding input fields under the "Sample to evaluate" section on the right.**
48
-
49
- <br><br>
50
  """
51
 
52
- DEFAULT_EVAL_PROMPT = """You are assessing a chat bot response to a user's input based on [INSERT CRITERIA]
53
 
54
  Score:
55
  A score of 1 means that the response's answer meets all of the evaluation criteria.
56
  A score of 0 means that the response's answer does not meet all of the evaluation criteria.
57
 
58
- Here is the data:
59
- [BEGIN DATA]
60
- ***
61
  [User Query]: {{input}}
62
- ***
63
- [Response]: {{response}}
64
- ***
65
- [END DATA]"""
66
 
67
  # Default Variable Values
68
  DEFAULT_INPUT = """Which of these animals is least likely to be found in a rainforest?"
@@ -79,7 +72,7 @@ VOTING_HEADER = """
79
 
80
  # Acknowledgements
81
  ACKNOWLEDGEMENTS = """
82
- <br><br><br>
83
  # Acknowledgements
84
 
85
  We thank [LMSYS Org](https://lmsys.org/) for their hard work on the Chatbot Arena and fully credit them for the inspiration to build this.
@@ -152,4 +145,4 @@ Atla currently funds this out of our own pocket. We are looking for API credits
152
  We are training a general-purpose evaluator that you will soon be able to run in this Judge Arena. Our next step will be to open-source a powerful model that the community can use to run fast and accurate evaluations.
153
  <br><br>
154
  # Get in touch
155
- Feel free to email us at [support@atla-ai.com](mailto:support@atla-ai.com) or leave feedback on our [Github](https://github.com/atla-ai/judge-arena)!"""
 
1
  # Page Headers
2
+ MAIN_TITLE = "# Judge Arena: Benchmarking LLMs as Evaluators"
3
 
4
  # How it works section
5
  HOW_IT_WORKS = """
6
+ Vote to help the community find the best LLM-as-a-judge to use!
 
7
  """
8
 
9
  BATTLE_RULES = """
10
+ ## 🤺 Choose the winner
11
+ 1. Define your scoring criteria in the **Evaluator Prompt**
12
+ 2. Add a test case to the **Sample to evaluate**
13
+ 3. Test the evaluators & vote for the model that best aligns with your judgement!
14
+ \n
15
+ Variables defined in your prompt with {{double curly braces}} map to input fields under **Sample to evaluate**.
16
+
17
  <br>
18
  """
19
 
 
37
  gap: 8px;
38
  }
39
  """
40
+
41
  # Default Eval Prompt
42
  EVAL_DESCRIPTION = """
43
+ ## 📝 Tips
44
+ **Precise evaluation criteria leads to more consistent and reliable judgments.** A good evaluation prompt should include the following elements:
45
  - Evaluation criteria
46
  - Scoring rubric
47
+ - Examples (Optional)
 
 
 
 
48
  """
49
 
50
+ DEFAULT_EVAL_PROMPT = """You are assessing a chat bot response to a user's input based on [WRITE CRITERIA HERE]
51
 
52
  Score:
53
  A score of 1 means that the response's answer meets all of the evaluation criteria.
54
  A score of 0 means that the response's answer does not meet all of the evaluation criteria.
55
 
 
 
 
56
  [User Query]: {{input}}
57
+
58
+ [Response]: {{response}}"""
 
 
59
 
60
  # Default Variable Values
61
  DEFAULT_INPUT = """Which of these animals is least likely to be found in a rainforest?"
 
72
 
73
  # Acknowledgements
74
  ACKNOWLEDGEMENTS = """
75
+ <br><br>
76
  # Acknowledgements
77
 
78
  We thank [LMSYS Org](https://lmsys.org/) for their hard work on the Chatbot Arena and fully credit them for the inspiration to build this.
 
145
  We are training a general-purpose evaluator that you will soon be able to run in this Judge Arena. Our next step will be to open-source a powerful model that the community can use to run fast and accurate evaluations.
146
  <br><br>
147
  # Get in touch
148
+ Feel free to email us at [support@atla-ai.com](mailto:support@atla-ai.com) or leave feedback on our [Github](https://github.com/atla-ai/judge-arena)!"""