liuylhf commited on
Commit
785c488
1 Parent(s): bf19110

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -3
README.md CHANGED
@@ -51,9 +51,7 @@ We benchmarked our model against a few other options, on [three datasets](https:
51
 
52
  - Multi-Turn Dataset: Designed to simulate a complex real-world environment, such as a healthcare appointment booking system, the model navigates between natural conversation, initiating function calls, asking clarifying questions, and, when necessary, transferring to customer service. The assessment focuses on the accuracy of intent classification and the correctness of function calls.
53
 
54
- In the benchmark, we compared the model against other function-calling models including GPT-4, GPT-3.5, Firefunctions, Together.ai, and Anyscale. For Together.ai and Anyscale, we used mistralai/Mixtral-8x7B-Instruct-v0.1, as it represents their best offering. empower-functions consistently deliver superior performance in all scenarios, especially in the multi-turn dataset and the parallel-calling dataset, which are closer to real-world use cases.
55
-
56
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6424a49f12ba34f9894ab9b7/_jBEMv9vN30kz3m9auJWz.png)
57
 
58
  ## Demo App
59
  Check our healthcare appointment booking [demo](https://app.empower.dev/chat-demo)
 
51
 
52
  - Multi-Turn Dataset: Designed to simulate a complex real-world environment, such as a healthcare appointment booking system, the model navigates between natural conversation, initiating function calls, asking clarifying questions, and, when necessary, transferring to customer service. The assessment focuses on the accuracy of intent classification and the correctness of function calls.
53
 
54
+ For more detailed evaluation result, please refer to our [github repo](https://github.com/empower-ai/empower-functions)
 
 
55
 
56
  ## Demo App
57
  Check our healthcare appointment booking [demo](https://app.empower.dev/chat-demo)