Huanzhi Mao commited on
Commit
027abe2
1 Parent(s): 8a12377

add description

Browse files
Files changed (1) hide show
  1. app.py +25 -0
app.py CHANGED
@@ -1059,6 +1059,31 @@ with gr.Blocks() as demo:
1059
  )
1060
  leaderboard_data = gr.Dataframe(value=get_leaderboard(), wrap=True)
1061
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1062
  with gr.TabItem("Try It Out"):
1063
  with gr.Row():
1064
  with gr.Column(scale=1):
 
1059
  )
1060
  leaderboard_data = gr.Dataframe(value=get_leaderboard(), wrap=True)
1061
 
1062
+ with gr.TabItem("Evaluation Categories"):
1063
+ gr.Markdown(
1064
+ """
1065
+ # Python Evaluation
1066
+
1067
+ **Simple Function** evaluation contains the simplest but most commonly seen format, where the user supplies a single JSON function document, with one and only one function call will be invoked.
1068
+
1069
+ **Multiple Function** contains a user question that only invokes one function call out of 2 to 4 JSON function documentations. The model needs to be capable of selecting the best function to invoke according to user provided context.
1070
+
1071
+ **Parallel Function** is defined as invoking multiple function calls in parallel with one user query. The model needs to digest how many function calls need to be made and the question to model can be a single sentence or multiple sentence.
1072
+
1073
+ **Parallel Multiple Function** is the combination of parallel function and multiple function. In another word, the model is provided with multiple function documentations, each of the corresponding function calls will be invoked zero or more times.
1074
+ """
1075
+
1076
+ )
1077
+ gr.Markdown(
1078
+ """
1079
+ # non-Python Evaluation
1080
+
1081
+ In **relevance detection**, we design scenarios where none of the provided functions are relevant and supposed to be invoked. We expect the model's output to be no function call. This scenario provides insight to whether a model will hallucinate on its function and parameter to generate function code despite lacking the function information or instructions from the users to do so.
1082
+
1083
+ In **REST**, we include real world GET requests to test the model's capabilities to generate executable REST API calls through complex function documentations, using requests.get() along with the API's hardcoded URL and description of the purpose of the function and its parameters. Our evaluation includes two variations. The first type requires passing the parameters inside the URL, called path parameters. The second type requires the model to put parameters as key/value pairs into the params and/or headers of requests.get(.).
1084
+
1085
+ In **Java** and **Javascript**, the goal is to understand how well the function calling model can be extended to not just Python type but all the language specific typings such as the HashMap in Java. We included 100 examples for Java AST evaluation and 70 examples for Javascript AST evaluation.
1086
+ """)
1087
  with gr.TabItem("Try It Out"):
1088
  with gr.Row():
1089
  with gr.Column(scale=1):