Huanzhi Mao commited on
Commit
0b85412
1 Parent(s): bdd55a2

update data.csv, in sync with BFCL April 16th Release

Browse files
Files changed (2) hide show
  1. app.py +3 -3
  2. data.csv +32 -30
app.py CHANGED
@@ -626,10 +626,10 @@ COLUMNS = [
626
  "Parallel Functions Exec",
627
  "Parallel Multiple Exec",
628
  "Relevance Detection",
629
- "Cost ($ Per 1k Function Calls)",
630
  "Latency Mean (s)",
631
- "Latency Standard Deviation (s)",
632
- "Latency 95th Percentile (s)",
633
  "Organization",
634
  "License",
635
  ]
 
626
  "Parallel Functions Exec",
627
  "Parallel Multiple Exec",
628
  "Relevance Detection",
629
+ "Cost ($)",
630
  "Latency Mean (s)",
631
+ "Latency SD (s)",
632
+ "Latency P95 (s)",
633
  "Organization",
634
  "License",
635
  ]
data.csv CHANGED
@@ -1,31 +1,33 @@
1
  Rank,Overall Acc,Model,Model Link,Organization,License,AST Summary,Exec Summary,Simple Function AST,Python Simple Function AST,Java Simple Function AST,JavaScript Simple Function AST,Multiple Functions AST,Parallel Functions AST,Parallel Multiple AST,Simple Function Exec,Python Simple Function Exec,REST Simple Function Exec,Multiple Functions Exec,Parallel Functions Exec,Parallel Multiple Exec,Relevance Detection,Cost ($ Per 1k Function Calls),Latency Mean (s),Latency Standard Deviation (s),Latency 95th Percentile (s)
2
- 1,83.82%,GPT-4-0125-Preview (Prompt),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,88.75%,70.07%,88.00%,94.25%,67.00%,80.00%,94.50%,90.50%,82.00%,85.29%,86.00%,84.29%,78.00%,72.00%,45.00%,70.42%,5.21,1.99,1.27,4.47
3
- 2,83.53%,Claude-3-Opus-20240229 (Prompt),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,86.09%,69.43%,86.36%,93.25%,66.00%,72.00%,93.50%,86.00%,78.50%,84.71%,87.00%,81.43%,76.00%,72.00%,45.00%,80.42%,10.8,5.05,1.78,7.54
4
- 3,81.24%,GPT-4-1106-Preview (FC),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,84.75%,67.29%,82.00%,90.25%,61.00%,58.00%,91.00%,91.50%,74.50%,77.65%,85.00%,67.14%,72.00%,72.00%,47.50%,80.42%,5.03,6.34,6.04,18.29
5
- 4,81.12%,Gorilla-OpenFunctions-v2 (FC),https://gorilla.cs.berkeley.edu/blogs/7_open_functions_v2.html,Gorilla LLM,Apache 2.0,86.16%,70.05%,87.64%,94.25%,67.00%,76.00%,94.50%,87.50%,75.00%,84.71%,87.00%,81.43%,78.00%,70.00%,47.50%,60.83%,1.7,2.65,2.33,6.64
6
- 5,79.94%,GPT-4-0125-Preview (FC),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,83.75%,65.25%,80.00%,90.25%,53.00%,52.00%,92.50%,90.00%,72.50%,70.00%,84.00%,50.00%,74.00%,72.00%,45.00%,82.92%,4.82,5.03,5.88,19.29
7
- 6,78.94%,Mistral-Medium-2312 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,81.44%,61.16%,79.27%,88.50%,54.00%,56.00%,92.50%,84.00%,70.00%,67.65%,86.00%,41.43%,72.00%,70.00%,35.00%,88.75%,1.75,2.77,2.27,6.31
8
- 7,77.29%,Claude-3-Sonnet-20240229 (Prompt),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,85.20%,69.35%,81.82%,90.25%,53.00%,72.00%,88.50%,88.00%,82.50%,75.88%,75.00%,77.14%,78.00%,76.00%,47.50%,50.42%,2.12,2.11,1.15,3.16
9
- 8,76.88%,Functionary-Medium-v2.4 (FC),https://huggingface.co/meetkai/functionary-medium-v2.4,MeetKai,MIT,82.36%,62.38%,79.45%,88.00%,55.00%,60.00%,90.00%,87.50%,72.50%,60.00%,80.00%,31.43%,66.00%,76.00%,47.50%,74.17%,1.64,2.55,2.6,7.51
10
- 9,75.94%,Functionary-Small-v2.4 (FC),https://huggingface.co/meetkai/functionary-small-v2.4,MeetKai,MIT,80.00%,64.73%,80.00%,88.75%,56.00%,58.00%,89.00%,82.00%,69.00%,69.41%,81.00%,52.86%,68.00%,74.00%,47.50%,67.92%,1.76,2.74,2.47,7.29
11
- 10,72.65%,Claude-instant-1.2 (Prompt),https://www.anthropic.com/news/releasing-claude-instant-1-2,Anthropic,Proprietary,76.63%,63.20%,80.00%,87.25%,56.00%,70.00%,86.00%,83.00%,57.50%,75.29%,84.00%,62.86%,72.00%,58.00%,47.50%,54.17%,0.95,1.35,0.62,2.22
12
- 11,70.76%,Claude-3-Opus-20240229 (FC),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,73.43%,62.02%,72.73%,84.25%,39.00%,48.00%,80.00%,75.00%,66.00%,70.59%,80.00%,57.14%,70.00%,70.00%,37.50%,65.00%,26.31,13.56,5.21,22.3
13
- 12,66.53%,Claude-3-Sonnet-20240229 (FC),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,71.38%,62.10%,70.00%,79.50%,39.00%,56.00%,73.50%,76.50%,65.50%,59.41%,85.00%,22.86%,72.00%,72.00%,45.00%,51.67%,5.08,6.44,2.41,10.11
14
- 13,65.35%,Claude-3-Haiku-20240307 (FC),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,76.32%,59.07%,77.27%,86.75%,51.00%,54.00%,79.50%,78.50%,70.00%,51.76%,82.00%,8.57%,72.00%,70.00%,42.50%,22.50%,0.45,3.45,1.58,6.42
15
- 14,65.12%,Claude-2.1 (Prompt),https://www.anthropic.com/news/claude-2-1,Anthropic,Proprietary,62.59%,46.39%,74.36%,80.75%,54.00%,64.00%,75.50%,55.50%,45.00%,47.06%,63.00%,24.29%,66.00%,40.00%,32.50%,83.33%,6.64,3.72,2.02,7.6
16
- 15,64.59%,Mistral-large-2402 (FC Auto),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,62.09%,46.32%,66.36%,88.75%,5.00%,10.00%,94.00%,25.50%,62.50%,65.29%,76.00%,50.00%,76.00%,4.00%,40.00%,84.17%,4.94,2.84,2.57,8.4
17
- 16,64.24%,DBRX-Instruct (Prompt),https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm,Databricks,Databricks Open Model,65.31%,63.22%,66.73%,79.50%,30.00%,38.00%,72.00%,72.00%,50.50%,65.88%,76.00%,51.43%,70.00%,72.00%,45.00%,56.25%,1.25,0.63,0.45,1.37
18
- 17,61.12%,Mistral-large-2402 (FC Any),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,68.98%,50.99%,82.91%,91.50%,62.00%,56.00%,93.00%,31.50%,68.50%,82.94%,84.00%,81.43%,74.00%,2.00%,45.00%,0.00%,3.9,1.86,1.09,4.28
19
- 18,58.35%,GPT-3.5-Turbo-0125 (FC),https://platform.openai.com/docs/models/gpt-3-5-turbo,OpenAI,Proprietary,70.52%,66.33%,57.09%,57.50%,53.00%,62.00%,65.50%,90.00%,69.50%,78.82%,81.00%,75.71%,70.00%,74.00%,42.50%,2.08%,0.42,1.26,0.68,2.45
20
- 19,58.12%,Mistral-small-2402 (FC Any),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,64.27%,46.94%,81.09%,90.25%,56.00%,58.00%,95.50%,39.00%,41.50%,81.76%,85.00%,77.14%,74.00%,12.00%,20.00%,0.00%,0.96,1.05,0.95,2.31
21
- 20,57.88%,Hermes-2-Pro-Mistral-7B (FC),https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B,NousResearch,apache-2.0,67.99%,52.94%,71.45%,81.00%,42.00%,54.00%,81.00%,66.50%,53.00%,51.76%,79.00%,12.86%,70.00%,50.00%,40.00%,10.83%,0.15,0.39,N/A,N/A
22
- 21,56.41%,Gemini-1.0-Pro (FC),https://deepmind.google/technologies/gemini/#introduction,Google,Proprietary,41.94%,38.57%,77.27%,89.75%,37.00%,58.00%,90.50%,0.00%,0.00%,75.29%,82.00%,65.71%,74.00%,0.00%,5.00%,77.50%,0.19,1.06,0.55,1.67
23
- 22,51.18%,FireFunction-v1 (FC),https://huggingface.co/fireworks-ai/firefunction-v1,Fireworks,Apache 2.0,39.94%,33.40%,67.27%,86.50%,13.00%,22.00%,92.50%,0.00%,0.00%,60.59%,79.00%,34.29%,68.00%,0.00%,5.00%,73.33%,N/A,1.24,1.2,3.26
24
- 23,51.06%,Nexusflow-Raven-v2 (FC),https://huggingface.co/Nexusflow/NexusRaven-V2-13B,Nexusflow,Apache 2.0,55.05%,57.22%,70.18%,76.00%,52.00%,60.00%,75.50%,30.50%,44.00%,55.88%,81.00%,20.00%,76.00%,52.00%,45.00%,2.08%,N/A,1.86,1.39,4.45
25
- 24,49.65%,GPT-4-0613 (FC),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,38.53%,25.90%,61.64%,83.50%,4.00%,2.00%,92.50%,0.00%,0.00%,40.59%,58.00%,15.71%,58.00%,0.00%,5.00%,91.67%,10.48,3.54,3.37,11.15
26
- 25,48.71%,Mistral-tiny-2312 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,46.91%,28.71%,49.64%,61.75%,26.00%,0.00%,56.50%,47.50%,34.00%,22.35%,36.00%,2.86%,20.00%,50.00%,22.50%,82.08%,0.13,1.79,1.63,5.12
27
- 26,41.47%,Gemma-7b-it (Prompt),https://blog.google/technology/developers/gemma-open-models/,Google,gemma-terms-of-use,39.05%,33.15%,42.18%,47.75%,29.00%,24.00%,48.00%,30.00%,36.00%,30.59%,45.00%,10.00%,32.00%,40.00%,30.00%,60.42%,0.03,0.09,N/A,N/A
28
- 27,39.00%,Deepseek-v1.5 (Prompt),https://huggingface.co/deepseek-ai/deepseek-coder-7b-instruct-v1.5,Deepseek,Deepseek License,36.98%,28.24%,38.91%,49.50%,4.00%,24.00%,48.50%,37.00%,23.50%,32.94%,38.00%,25.71%,36.00%,34.00%,10.00%,56.67%,0.45,1.2,N/A,N/A
29
- 28,37.65%,Mistral-Small-2402 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,37.66%,27.93%,5.64%,5.75%,6.00%,4.00%,8.00%,78.50%,58.50%,24.71%,4.00%,54.29%,0.00%,62.00%,25.00%,98.33%,0.7,1.09,0.95,2.94
30
- 29,32.47%,Mistral-Large-2402 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,28.92%,14.88%,14.18%,16.50%,3.00%,18.00%,13.50%,33.50%,54.50%,20.00%,8.00%,37.14%,8.00%,14.00%,17.50%,91.25%,1.11,1.72,1.04,3.66
31
- 30,17.53%,Mistral-small-2402 (FC Auto),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,2.53%,6.97%,1.64%,2.25%,0.00%,0.00%,2.50%,3.00%,3.00%,15.88%,16.00%,15.71%,12.00%,0.00%,0.00%,99.58%,2.02,2.93,1.61,5.79
 
 
 
1
  Rank,Overall Acc,Model,Model Link,Organization,License,AST Summary,Exec Summary,Simple Function AST,Python Simple Function AST,Java Simple Function AST,JavaScript Simple Function AST,Multiple Functions AST,Parallel Functions AST,Parallel Multiple AST,Simple Function Exec,Python Simple Function Exec,REST Simple Function Exec,Multiple Functions Exec,Parallel Functions Exec,Parallel Multiple Exec,Relevance Detection,Cost ($ Per 1k Function Calls),Latency Mean (s),Latency Standard Deviation (s),Latency 95th Percentile (s)
2
+ 1,84.41%,GPT-4-0125-Preview (Prompt),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,88.75%,71.54%,88.00%,94.25%,67.00%,80.00%,94.50%,90.50%,82.00%,91.18%,86.00%,98.57%,78.00%,72.00%,45.00%,70.42%,5.21,1.99,1.27,4.47
3
+ 2,84.12%,Claude-3-Opus-20240229 (Prompt),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,86.09%,70.90%,86.36%,93.25%,66.00%,72.00%,93.50%,86.00%,78.50%,90.59%,87.00%,95.71%,76.00%,72.00%,45.00%,80.42%,10.8,5.05,1.78,7.54
4
+ 3,81.88%,GPT-4-turbo-2024-04-09 (Prompt),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,86.83%,71.04%,85.82%,93.00%,60.00%,80.00%,94.50%,90.00%,77.00%,91.18%,86.00%,98.57%,80.00%,68.00%,45.00%,62.50%,5.22,2.68,2.41,6.43
5
+ 4,81.76%,GPT-4-1106-Preview (FC),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,84.75%,68.26%,82.00%,90.25%,61.00%,58.00%,91.00%,91.50%,74.50%,83.53%,85.00%,81.43%,70.00%,72.00%,47.50%,80.42%,5.03,6.34,6.04,18.29
6
+ 5,81.71%,Gorilla-OpenFunctions-v2 (FC),https://gorilla.cs.berkeley.edu/blogs/7_open_functions_v2.html,Gorilla LLM,Apache 2.0,86.16%,71.52%,87.64%,94.25%,67.00%,76.00%,94.50%,87.50%,75.00%,90.59%,87.00%,95.71%,78.00%,70.00%,47.50%,60.83%,1.7,2.65,2.33,6.64
7
+ 6,80.29%,GPT-4-0125-Preview (FC),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,83.75%,66.13%,80.00%,90.25%,53.00%,52.00%,92.50%,90.00%,72.50%,73.53%,84.00%,58.57%,74.00%,72.00%,45.00%,82.92%,4.82,5.03,5.88,19.29
8
+ 7,79.47%,Mistral-Medium-2312 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,81.44%,62.13%,79.27%,88.50%,54.00%,56.00%,92.50%,84.00%,70.00%,73.53%,86.00%,55.71%,70.00%,70.00%,35.00%,88.75%,1.75,2.77,2.27,6.31
9
+ 8,78.76%,GPT-4-turbo-2024-04-09 (FC),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,81.70%,65.13%,73.82%,90.00%,33.00%,26.00%,89.50%,89.00%,74.50%,73.53%,83.00%,60.00%,70.00%,72.00%,45.00%,88.75%,4.79,5.68,6.67,20.07
10
+ 9,77.88%,Claude-3-Sonnet-20240229 (Prompt),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,85.20%,70.82%,81.82%,90.25%,53.00%,72.00%,88.50%,88.00%,82.50%,81.76%,75.00%,91.43%,78.00%,76.00%,47.50%,50.42%,2.12,2.11,1.15,3.16
11
+ 10,77.12%,Functionary-Medium-v2.4 (FC),https://huggingface.co/meetkai/functionary-medium-v2.4,MeetKai,MIT,82.36%,62.61%,79.45%,88.00%,55.00%,60.00%,90.00%,87.50%,72.50%,62.94%,80.00%,38.57%,66.00%,74.00%,47.50%,74.17%,1.64,2.55,2.6,7.51
12
+ 11,76.18%,Functionary-Small-v2.4 (FC),https://huggingface.co/meetkai/functionary-small-v2.4,MeetKai,MIT,80.00%,65.32%,80.00%,88.75%,56.00%,58.00%,89.00%,82.00%,69.00%,71.76%,81.00%,58.57%,68.00%,74.00%,47.50%,67.92%,1.76,2.74,2.47,7.29
13
+ 12,73.71%,Claude-3-Opus-20240229 (FC tools-2024-04-04),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,70.35%,55.20%,80.91%,87.00%,61.00%,72.00%,91.00%,58.00%,51.50%,85.29%,85.00%,85.71%,74.00%,24.00%,37.50%,82.50%,30.65,12.63,3.64,19.72
14
+ 13,73.00%,Claude-instant-1.2 (Prompt),https://www.anthropic.com/news/releasing-claude-instant-1-2,Anthropic,Proprietary,76.63%,64.08%,80.00%,87.25%,56.00%,70.00%,86.00%,83.00%,57.50%,78.82%,84.00%,71.43%,72.00%,58.00%,47.50%,54.17%,0.95,1.35,0.62,2.22
15
+ 14,71.65%,Claude-3-Haiku-20240307 (Prompt),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,77.36%,64.26%,85.45%,94.25%,55.00%,76.00%,92.00%,84.00%,48.00%,87.06%,88.00%,85.71%,84.00%,46.00%,40.00%,29.58%,0.18,0.99,0.43,1.76
16
+ 15,65.12%,Claude-2.1 (Prompt),https://www.anthropic.com/news/claude-2-1,Anthropic,Proprietary,62.59%,46.39%,74.36%,80.75%,54.00%,64.00%,75.50%,55.50%,45.00%,47.06%,63.00%,24.29%,66.00%,40.00%,32.50%,83.33%,6.64,3.72,2.02,7.6
17
+ 16,65.00%,Mistral-large-2402 (FC Auto),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,62.09%,47.35%,66.36%,88.75%,5.00%,10.00%,94.00%,25.50%,62.50%,69.41%,76.00%,60.00%,76.00%,4.00%,40.00%,84.17%,4.94,2.84,2.57,8.4
18
+ 17,64.59%,DBRX-Instruct (Prompt),https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm,Databricks,Databricks Open Model,65.31%,64.10%,66.73%,79.50%,30.00%,38.00%,72.00%,72.00%,50.50%,69.41%,76.00%,60.00%,70.00%,72.00%,45.00%,56.25%,1.25,0.63,0.45,1.37
19
+ 18,61.71%,Mistral-large-2402 (FC Any),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,68.98%,52.46%,82.91%,91.50%,62.00%,56.00%,93.00%,31.50%,68.50%,88.82%,84.00%,95.71%,74.00%,2.00%,45.00%,0.00%,3.9,1.86,1.09,4.28
20
+ 19,58.94%,GPT-3.5-Turbo-0125 (FC),https://platform.openai.com/docs/models/gpt-3-5-turbo,OpenAI,Proprietary,70.52%,67.80%,57.09%,57.50%,53.00%,62.00%,65.50%,90.00%,69.50%,84.71%,81.00%,90.00%,70.00%,74.00%,42.50%,2.08%,0.42,1.26,0.68,2.45
21
+ 20,58.71%,Mistral-small-2402 (FC Any),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,64.27%,48.41%,81.09%,90.25%,56.00%,58.00%,95.50%,39.00%,41.50%,87.65%,85.00%,91.43%,74.00%,12.00%,20.00%,0.00%,0.96,1.05,0.95,2.31
22
+ 21,58.41%,Hermes-2-Pro-Mistral-7B (FC),https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B,NousResearch,apache-2.0,67.99%,54.26%,71.45%,81.00%,42.00%,54.00%,81.00%,66.50%,53.00%,57.06%,79.00%,25.71%,70.00%,50.00%,40.00%,10.83%,0.15,0.39,N/A,N/A
23
+ 22,58.06%,Claude-3-Sonnet-20240229 (FC tools-2024-04-04),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,44.06%,38.66%,76.73%,86.00%,49.00%,58.00%,87.50%,6.00%,6.00%,77.65%,86.00%,65.71%,72.00%,0.00%,5.00%,81.67%,3.41,3.35,1.47,6.93
24
+ 23,56.94%,Gemini-1.0-Pro (FC),https://deepmind.google/technologies/gemini/#introduction,Google,Proprietary,41.94%,39.90%,77.27%,89.75%,37.00%,58.00%,90.50%,0.00%,0.00%,80.59%,81.00%,80.00%,74.00%,0.00%,5.00%,77.50%,0.19,1.06,0.55,1.67
25
+ 24,52.59%,Claude-3-Haiku-20240307 (FC tools-2024-04-04),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,44.69%,42.72%,85.27%,94.25%,60.00%,64.00%,93.00%,0.50%,0.00%,85.88%,86.00%,85.71%,80.00%,0.00%,5.00%,20.83%,0.29,1.52,0.64,2.31
26
+ 25,51.53%,FireFunction-v1 (FC),https://huggingface.co/fireworks-ai/firefunction-v1,Fireworks,Apache 2.0,39.94%,34.28%,67.27%,86.50%,13.00%,22.00%,92.50%,0.00%,0.00%,64.12%,79.00%,42.86%,68.00%,0.00%,5.00%,73.33%,N/A,1.24,1.2,3.26
27
+ 26,50.94%,Nexusflow-Raven-v2 (FC),https://huggingface.co/Nexusflow/NexusRaven-V2-13B,Nexusflow,Apache 2.0,55.05%,56.93%,70.18%,76.00%,52.00%,60.00%,75.50%,30.50%,44.00%,54.71%,79.00%,20.00%,76.00%,52.00%,45.00%,2.08%,N/A,1.86,1.39,4.45
28
+ 27,49.71%,GPT-4-0613 (FC),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,38.53%,26.04%,61.64%,83.50%,4.00%,2.00%,92.50%,0.00%,0.00%,41.18%,57.00%,18.57%,58.00%,0.00%,5.00%,91.67%,10.48,3.54,3.37,11.15
29
+ 28,48.71%,Mistral-tiny-2312 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,46.91%,28.71%,49.64%,61.75%,26.00%,0.00%,56.50%,47.50%,34.00%,22.35%,36.00%,2.86%,20.00%,50.00%,22.50%,82.08%,0.13,1.79,1.63,5.12
30
+ 29,41.47%,Gemma-7b-it (Prompt),https://blog.google/technology/developers/gemma-open-models/,Google,gemma-terms-of-use,39.05%,33.15%,42.18%,47.75%,29.00%,24.00%,48.00%,30.00%,36.00%,30.59%,45.00%,10.00%,32.00%,40.00%,30.00%,60.42%,0.03,0.09,N/A,N/A
31
+ 30,39.41%,Deepseek-v1.5 (Prompt),https://huggingface.co/deepseek-ai/deepseek-coder-7b-instruct-v1.5,Deepseek,Deepseek License,36.98%,29.26%,38.91%,49.50%,4.00%,24.00%,48.50%,37.00%,23.50%,37.06%,38.00%,35.71%,36.00%,34.00%,10.00%,56.67%,0.45,1.2,N/A,N/A
32
+ 31,38.18%,Mistral-Small-2402 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,37.66%,29.25%,5.64%,5.75%,6.00%,4.00%,8.00%,78.50%,58.50%,30.00%,4.00%,67.14%,0.00%,62.00%,25.00%,98.33%,0.7,1.09,0.95,2.94
33
+ 32,17.65%,Mistral-small-2402 (FC Auto),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,2.53%,7.26%,1.64%,2.25%,0.00%,0.00%,2.50%,3.00%,3.00%,17.06%,16.00%,18.57%,12.00%,0.00%,0.00%,99.58%,2.02,2.93,1.61,5.79