--- license: cc-by-nc-4.0 ---
drawing
🎉 GitHub: https://github.com/SalesforceAIResearch/xLAM 🎉 Paper: https://arxiv.org/abs/2402.15506 License: cc-by-nc-4.0 If you already know [Mixtral](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1), xLAM-v0.1 is a significant upgrade and better at many things. For the same number of parameters, the model have been fine-tuned across a wide range of agent tasks and scenarios, all while preserving the capabilities of the original model. xLAM-v0.1-r represents the version 0.1 of the Large Action Model series, with the "-r" indicating it's tagged for research. This model is compatible with VLLM and FastChat platforms. ```python from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("Salesforce/xLAM-v0.1-r") model = AutoModelForCausalLM.from_pretrained("Salesforce/xLAM-v0.1-r", device_map="auto") messages = [ {"role": "user", "content": "What is your favourite condiment?"}, {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"}, {"role": "user", "content": "Do you have mayonnaise recipes?"} ] inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda") outputs = model.generate(inputs, max_new_tokens=512) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` You may need to tune the Temperature setting for different applications. Typically, a lower Temperature is helpful for tasks that require deterministic outcomes. Additionally, for tasks demanding adherence to specific formats or function calls, explicitly including formatting instructions is advisable. # Benchmarks ## [BOLAA](https://github.com/salesforce/BOLAA) ### Webshop
LLM NameZSZSTReaActPlanActPlanReActBOLAA
Llama-2-70B-chat 0.0089 0.01020.42730.28090.39660.4986
Vicuna-33B 0.1527 0.21220.19710.37660.40320.5618
Mixtral-8x7B-Instruct-v0.1 0.4634 0.45920.56380.47380.33390.5342
GPT-3.5-Turbo 0.4851 0.50580.50470.49300.54360.6354
GPT-3.5-Turbo-Instruct 0.3785 0.41950.43770.36040.48510.5811
GPT-4-06130.50020.4783 0.46160.79500.46350.6129
xLAM-v0.1-r0.52010.52680.64860.65730.66110.6556
### HotpotQA
LLM NameZSZSTReaActPlanActPlanReAct
Mixtral-8x7B-Instruct-v0.1 0.3912 0.39710.37140.31950.3039
GPT-3.5-Turbo 0.4196 0.39370.38680.41820.3960
GPT-4-06130.58010.5709 0.61290.57780.5716
xLAM-v0.1-r0.54920.47760.50200.55830.5030
## [AgentLite](https://github.com/SalesforceAIResearch/AgentLite/tree/main) **Please note:** All prompts provided by AgentLite are considered "unseen prompts" for xLAM-v0.1-r, meaning the model has not been trained with data related to these prompts. #### Webshop
LLM NameActReActBOLAA
GPT-3.5-Turbo-16k 0.6158 0.60050.6652
GPT-4-06130.6989 0.67320.7154
xLAM-v0.1-r0.65630.66400.6854
#### HotpotQA
EasyMediumHard
LLM NameF1 ScoreAccuracyF1 ScoreAccuracyF1 ScoreAccuracy
GPT-3.5-Turbo-16k-0613 0.410 0.3500.3300.250.2830.20
GPT-4-06130.6110.47 0.6100.4800.5270.38
xLAM-v0.1-r0.5320.450.5470.460.4550.36
## ToolBench
LLM NameUnseen Insts & Same SetUnseen Tools & Seen CatUnseen Tools & Unseen Cat
TooLlama V2 0.4385 0.43000.4350
GPT-3.5-Turbo-0125 0.5000 0.51500.4900
GPT-4-0125-preview0.54620.54500.5050
xLAM-v0.1-r0.50770.56500.5200
## [MINT-BENCH](https://github.com/xingyaoww/mint-bench)
LLM Name1-step2-step3-step4-step5-step
GPT-4-0613----69.45
Claude-Instant-112.1232.2539.2544.3745.90
xLAM-v0.1-r4.1028.5036.0142.6643.96
Claude-2 26.45 35.4936.0139.7639.93
Lemur-70b-Chat-v1 3.75 26.9635.6737.5437.03
GPT-3.5-Turbo-0613 2.7316.8924.0631.7436.18
AgentLM-70b 6.4817.7524.9128.1628.67
CodeLlama-34b 0.1716.2123.0425.9428.16
Llama-2-70b-chat 4.2714.3315.7016.5517.92
## [Tool-Query](https://github.com/hkust-nlp/AgentBoard)
LLM NameSuccess RateProgress Rate
xLAM-v0.1-r0.5330.766
DeepSeek-67B 0.400 0.714
GPT-3.5-Turbo-0613 0.367 0.627
GPT-3.5-Turbo-16k 0.3170.591
Lemur-70B 0.2830.720
CodeLlama-13B 0.2500.525
CodeLlama-34B 0.1330.600
Mistral-7B 0.0330.510
Vicuna-13B-16K 0.0330.343
Llama-2-70B 0.0000.483