arxiv:2404.01744

Octopus v2: On-device language model for super agent

Published on Apr 2

· Submitted by

akhaliq on Apr 3

#1 Paper of the day

Upvote

Authors:

Wei Chen ,

Zhiyuan Li

Abstract

Language models have shown effectiveness in a variety of software applications, particularly in tasks related to automatic workflow. These models possess the crucial ability to call functions, which is essential in creating AI agents. Despite the high performance of large-scale language models in cloud environments, they are often associated with concerns over privacy and cost. Current on-device models for function calling face issues with latency and accuracy. Our research presents a new method that empowers an on-device model with 2 billion parameters to surpass the performance of GPT-4 in both accuracy and latency, and decrease the context length by 95\%. When compared to Llama-7B with a RAG-based function calling mechanism, our method enhances latency by 35-fold. This method reduces the latency to levels deemed suitable for deployment across a variety of edge devices in production environments, aligning with the performance requisites for real-world applications.

View arXiv page View PDF Add to collection

Community

julien-c

Apr 3

tweeted by Aran Komatsuzaki: https://twitter.com/arankomatsuzaki/status/1775354511252459782

MichaelBarryUK

Apr 3

Very interesting 👍

The single symbol function name is a neat little trick, feels so obvious in hindsight.

It would be very interesting to see just how much information we can encapsulate in a single symbol... And then re-use those symbols to increase throughput

A kind of meta-language that represents a topology of layers of abstraction, on top of layers of abstraction

Each time a new concept is learned, it gets a symbol, and that symbol is then used to further train the model. This new alphabet would effectively represent a map of the knowledge in the model.

Kind of like database normalisation for embeddings.

julien-c

Apr 3

🤯

marcusinthesky

Apr 3

•

edited Apr 12

I think the idea of using special tokens makes a lot of sense. I think we under appreciate the power and expressiveness of token-space in LLMs.

If you look at techniques LlaVa, ViTs need registers and Prompt Fine Tuning, all of these effectively hack the expressiveness of token-space. With long-context the opportunity to use token-space is larger. If you look at models like Bert, almost 30% of the model is embedding weight, but unlike most layers the ability to add just a single new token can be extremely efficient and extremely powerful. I think as a research community there is a lot of exciting stuff on the horizon here.