Octopus v2: On-device language model for super agent

Published on Apr 2
Β· Featured in Daily Papers on Apr 3


Language models have shown effectiveness in a variety of software applications, particularly in tasks related to automatic workflow. These models possess the crucial ability to call functions, which is essential in creating AI agents. Despite the high performance of large-scale language models in cloud environments, they are often associated with concerns over privacy and cost. Current on-device models for function calling face issues with latency and accuracy. Our research presents a new method that empowers an on-device model with 2 billion parameters to surpass the performance of GPT-4 in both accuracy and latency, and decrease the context length by 95\%. When compared to Llama-7B with a RAG-based function calling mechanism, our method enhances latency by 35-fold. This method reduces the latency to levels deemed suitable for deployment across a variety of edge devices in production environments, aligning with the performance requisites for real-world applications.


Very interesting πŸ‘

The single symbol function name is a neat little trick, feels so obvious in hindsight.

It would be very interesting to see just how much information we can encapsulate in a single symbol... And then re-use those symbols to increase throughput

A kind of meta-language that represents a topology of layers of abstraction, on top of layers of abstraction

Each time a new concept is learned, it gets a symbol, and that symbol is then used to further train the model. This new alphabet would effectively represent a map of the knowledge in the model.

Kind of like database normalisation for embeddings.



I think the idea of using special tokens makes a lot of sense. I think we under appreciate the power and expressiveness of token-space in LLMs.

If you look at techniques LlaVa, ViTs need registers and Prompt Fine Tuning, all of these effectively hack the expressiveness of token-space. With long-context the opportunity to use token-space is larger. If you look at models like Bert, almost 30% of the model is embedding weight, but unlike most layers the ability to add just a single new token can be extremely efficient and extremely powerful. I think as a research community there is a lot of exciting stuff on the horizon here.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Will the dataset for this be released in full as open source?

Sign up or log in to comment

Models citing this paper 3

Datasets citing this paper 0

No dataset linking this paper

Cite in a dataset to link it from this page.

Spaces citing this paper 5

Collections including this paper 18