Update README.md
Browse files
README.md
CHANGED
|
@@ -33,6 +33,12 @@ The code used to generate the dataset can be found [here](https://github.com/pre
|
|
| 33 |
<img src="assets/line_plot.png" alt="Line Plot" width="80%">
|
| 34 |
</div>
|
| 35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
## Results
|
| 37 |
|
| 38 |
### BFCL v3
|
|
|
|
| 33 |
<img src="assets/line_plot.png" alt="Line Plot" width="80%">
|
| 34 |
</div>
|
| 35 |
|
| 36 |
+
Notes:
|
| 37 |
+
- *Funcdex-0.6B is the average of performances of individual Funcdex-0.6B models.*
|
| 38 |
+
- For cost, we track the number of prompt/completion tokens for evaluating 300 conversations.
|
| 39 |
+
- e.g. If token cost is input=$1 and output=$10 per million tokens, and evaluation needed `0.5M` and `0.1M` input/output tokens, then cost is `1 * 0.5 + 0.1 * 10 = $1.5`.
|
| 40 |
+
- *Qwen3-0.6B and Qwen3-1.7B evaluation costs are estimated by extrapolating from Llama3.2-3B serverless costs. Other model's costs are sourced from Openrouter.*
|
| 41 |
+
|
| 42 |
## Results
|
| 43 |
|
| 44 |
### BFCL v3
|