DrNicefellow commited on
Commit
f78427b
β€’
1 Parent(s): fb4e7d5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -65
README.md CHANGED
@@ -8,68 +8,4 @@ pipeline_tag: text-generation
8
  tags:
9
  - llm-agent
10
  ---
11
-
12
- <h1 align="center"> Executable Code Actions Elicit Better LLM Agents </h1>
13
-
14
- <p align="center">
15
- <a href="https://github.com/xingyaoww/code-act">πŸ’» Code</a>
16
- β€’
17
- <a href="https://arxiv.org/abs/2402.01030">πŸ“ƒ Paper</a>
18
- β€’
19
- <a href="https://huggingface.co/datasets/xingyaoww/code-act" >πŸ€— Data (CodeActInstruct)</a>
20
- β€’
21
- <a href="https://huggingface.co/xingyaoww/CodeActAgent-Mistral-7b-v0.1" >πŸ€— Model (CodeActAgent-Mistral-7b-v0.1)</a>
22
- β€’
23
- <a href="https://chat.xwang.dev/">πŸ€– Chat with CodeActAgent!</a>
24
- </p>
25
-
26
- We propose to use executable Python **code** to consolidate LLM agents’ **act**ions into a unified action space (**CodeAct**).
27
- Integrated with a Python interpreter, CodeAct can execute code actions and dynamically revise prior actions or emit new actions upon new observations (e.g., code execution results) through multi-turn interactions.
28
-
29
- ![Overview](https://github.com/xingyaoww/code-act/blob/main/figures/overview.png?raw=true)
30
-
31
- ## Why CodeAct?
32
-
33
- Our extensive analysis of 17 LLMs on API-Bank and a newly curated benchmark [M<sup>3</sup>ToolEval](docs/EVALUATION.md) shows that CodeAct outperforms widely used alternatives like Text and JSON (up to 20% higher success rate). Please check our paper for more detailed analysis!
34
-
35
- ![Comparison between CodeAct and Text/JSON](https://github.com/xingyaoww/code-act/blob/main/figures/codeact-comparison-table.png?raw=true)
36
- *Comparison between CodeAct and Text / JSON as action.*
37
-
38
- ![Comparison between CodeAct and Text/JSON](https://github.com/xingyaoww/code-act/blob/main/figures/codeact-comparison-perf.png?raw=true)
39
- *Quantitative results comparing CodeAct and {Text, JSON} on M<sup>3</sup>ToolEval.*
40
-
41
-
42
-
43
- ## πŸ“ CodeActInstruct
44
-
45
- We collect an instruction-tuning dataset CodeActInstruct that consists of 7k multi-turn interactions using CodeAct. Dataset is release at [huggingface dataset πŸ€—](https://huggingface.co/datasets/xingyaoww/code-act). Please refer to the paper and [this section](#-data-generation-optional) for details of data collection.
46
-
47
-
48
- ![Data Statistics](https://github.com/xingyaoww/code-act/blob/main/figures/data-stats.png?raw=true)
49
- *Dataset Statistics. Token statistics are computed using Llama-2 tokenizer.*
50
-
51
- ## πŸͺ„ CodeActAgent
52
-
53
- Trained on **CodeActInstruct** and general conversaions, **CodeActAgent** excels at out-of-domain agent tasks compared to open-source models of the same size, while not sacrificing generic performance (e.g., knowledge, dialog). We release two variants of CodeActAgent:
54
- - **CodeActAgent-Mistral-7b-v0.1** (recommended, [model link](https://huggingface.co/xingyaoww/CodeActAgent-Mistral-7b-v0.1)): using Mistral-7b-v0.1 as the base model with 32k context window.
55
- - **CodeActAgent-Llama-7b** ([model link](https://huggingface.co/xingyaoww/CodeActAgent-Llama-2-7b)): using Llama-2-7b as the base model with 4k context window.
56
-
57
- ![Model Performance](https://github.com/xingyaoww/code-act/blob/main/figures/model-performance.png?raw=true)
58
- *Evaluation results for CodeActAgent. ID and OD stand for in-domain and out-of-domain evaluation correspondingly. Overall averaged performance normalizes the MT-Bench score to be consistent with other tasks and excludes in-domain tasks for fair comparison.*
59
-
60
-
61
- Please check out [our paper](TODO) and [code](https://github.com/xingyaoww/code-act) for more details about data collection, model training, and evaluation.
62
-
63
-
64
- ## πŸ“š Citation
65
-
66
- ```bibtex
67
- @misc{wang2024executable,
68
- title={Executable Code Actions Elicit Better LLM Agents},
69
- author={Xingyao Wang and Yangyi Chen and Lifan Yuan and Yizhe Zhang and Yunzhu Li and Hao Peng and Heng Ji},
70
- year={2024},
71
- eprint={2402.01030},
72
- archivePrefix={arXiv},
73
- primaryClass={cs.CL}
74
- }
75
- ```
 
8
  tags:
9
  - llm-agent
10
  ---
11
+ This is a 8.0bpw h8 quantized version of [xingyaoww/CodeActAgent-Mistral-7b-v0.1](https://huggingface.co/xingyaoww/CodeActAgent-Mistral-7b-v0.1). It is quantized with exllamav2.