Spaces:

qiantong-xu
/

toolbench-leaderboard

Running

App Files Files Community

qiantong-xu

hongfenglu commited on Nov 13, 2023

Commit

d38ed8c

•

1 Parent(s): e938cdf

add codellama results (#3)

Browse files

- add codellama results (2c732d3533fec8a7a7a40fe2107df6ca288ee6ef)

Co-authored-by: Fenglu Hong <hongfenglu@users.noreply.huggingface.co>

Files changed (1) hide show

app.py +9 -0

app.py CHANGED Viewed

@@ -16,6 +16,15 @@ UNTUNED_MODEL_RESULTS = '''[gpt4](https://platform.openai.com/docs/models/gpt-4)
 [llama-30b](https://huggingface.co/huggyllama/llama-30b)              & 78.0 & 84.0 & 66.0 & 45.0 & 37.1 & 27.0 / 21.7 & 0.0 & 30.6 & 34.3 \\
 [llama-13b](https://huggingface.co/huggyllama/llama-13b)                & 70.0 & 74.0 & 45.0 & 35.8 & 5.7  & 28.0 / 18.9 & 0.0 & 27.6 & 17.1 \\
 [llama-13b-alpaca](https://huggingface.co/chavinlo/gpt4-x-alpaca)    & 62.0 & 43.0 & 44.0 & 40.8 & 11.4 & 1.0 / 1.6   & 0.0 & 2.7  & 9.5  \\
 [starcoder](https://huggingface.co/bigcode/starcoder)                & 91.0 & 84.0 & 82.0 & 51.7 & 48.0 & 23.0 / 19.4 & 2.6 & 0.0  & 21.9 \\
 [starcoderbase](https://huggingface.co/bigcode/starcoderbase)           & 90.0 & 86.0 & 79.0 & 63.3 & 42.9 & 24.0 / 16.3 & 5.8 & 23.1 & 17.1 \\
 [codegen-16B-nl](https://huggingface.co/Salesforce/codegen-16B-nl)           & 51.0 & 75.0 & 37.0 & 21.7 & 7.1  & 43.0 / 18.0 & 0.0 & 0.0  & 16.2 \\

 [llama-30b](https://huggingface.co/huggyllama/llama-30b)              & 78.0 & 84.0 & 66.0 & 45.0 & 37.1 & 27.0 / 21.7 & 0.0 & 30.6 & 34.3 \\
 [llama-13b](https://huggingface.co/huggyllama/llama-13b)                & 70.0 & 74.0 & 45.0 & 35.8 & 5.7  & 28.0 / 18.9 & 0.0 & 27.6 & 17.1 \\
 [llama-13b-alpaca](https://huggingface.co/chavinlo/gpt4-x-alpaca)    & 62.0 & 43.0 & 44.0 & 40.8 & 11.4 & 1.0 / 1.6   & 0.0 & 2.7  & 9.5  \\
+[CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) & 86.0 & 92.0 & 74.0 & 63.33 & 38.08 & 35.0 / 21.97 & 0.0 & 0.0 & 11.16 \\
+[CodeLlama-7b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf) & 90.0 & 94.0 & 78.0 & 61.67 & 41.27 & 32.0 / 21.95 & 0.0 & 0.0 & 16.98 \\
+[CodeLlama-7b-Python-hf](https://huggingface.co/codellama/CodeLlama-7b-Python-hf) & 83.0 & 88.0 & 83.0 & 68.33 & 49.13 & 31.0 / 21.33 & 0.0 & 1.58 & 22.86 \\
+[CodeLlama-13b-hf](https://huggingface.co/codellama/CodeLlama-13b-hf) & 96.0 & 86.83 & 87.0 & 70.83 & 51.26 & 35.0 / 22.28 & 0.0 & 0.0 & 40.95 \\
+[CodeLlama-13b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-13b-Instruct-hf) & 98.0 & 89.58 & 85.0 & 72.5 & 48.97 & 31.0 / 22.56 & 0.0 & 9.7 & 57.62 \\
+[CodeLlama-13b-Python-hf](https://huggingface.co/codellama/CodeLlama-13b-Python-hf) & 93.0 & 92.35 & 86.0 & 62.5 & 50.79 & 37.0 / 21.94 & 0.0 & 2.4 & 41.53 \\
+[CodeLlama-34b-hf](https://huggingface.co/codellama/CodeLlama-34b-hf) & 96.39 & 85.0 & 88.0 & 88.33 & 64.29 & 34.0 / 24.65 & 0.0 & 5.53 & 51.32 \\
+[CodeLlama-34b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-34b-Instruct-hf) & 94.28 & 86.0 & 90.0 & 89.17 & 61.11 & 31.0 / 24.34 & 0.0 & 25.99 & 47.46 \\
+[CodeLlama-34b-Python-hf](https://huggingface.co/codellama/CodeLlama-34b-Python-hf) & 91.11 & 88.42 & 91.0 & 85.83 & 55.87 & 28.0 / 21.24 & 0.0 & 6.47 & 33.33 \\
 [starcoder](https://huggingface.co/bigcode/starcoder)                & 91.0 & 84.0 & 82.0 & 51.7 & 48.0 & 23.0 / 19.4 & 2.6 & 0.0  & 21.9 \\
 [starcoderbase](https://huggingface.co/bigcode/starcoderbase)           & 90.0 & 86.0 & 79.0 & 63.3 & 42.9 & 24.0 / 16.3 & 5.8 & 23.1 & 17.1 \\
 [codegen-16B-nl](https://huggingface.co/Salesforce/codegen-16B-nl)           & 51.0 & 75.0 & 37.0 & 21.7 & 7.1  & 43.0 / 18.0 & 0.0 & 0.0  & 16.2 \\