marclove's picture
Update README.md
a45376a
|
raw
history blame
4.99 kB
metadata
license: llama2
datasets:
  - marclove/llama_functions
  - timdettmers/openassistant-guanaco
language:
  - en
library_name: transformers
pipeline_tag: conversational

Model Card for Llama-2 7B Chat Functions

‼️ This model is still in a beta state. It will be retrained at a future data and updated, during which its prompting format may change. If you need to depend on it in its current state, please create your own fork and provide attribution to this original repository. ‼️

Llama Functions is a further fine-tuned version of Llama-2-7b-chat-hf, using a 50/50 mix of:

  1. Synthetic OpenAPI function calls with their corresponding natural language invocation, and
  2. Chat completions from the Guanaco subset of the OASST1 dataset.

13B & 70B versions are coming soon.

The function calling dataset is mixed with Guanaco in order to maintain accuracy and helpfulness when calling a function is not the appropriate response. Guidelines for use, more detailed information regarding limitations, and eval stats of 7B, 13B, and 70B models.

There is no existing evaluation benchmark to measure the accuracy of function calls, which makes it hard during training to identify when we've maximized the balance of function calling accuracy and chat model performance. I'm working on a custom HF eval for this purpose, but until then I have chosen to mix the two datasets in equal parts to get a proxy of performance for both tasks in the eval & test stats during fine-tuning. The current checkpoint is at 1000 steps, when eval & test loss reached their lowest point.

Model Sources [optional]

Uses

Please note: The synthetic data portion of the dataset was generated using OpenAI models. This model is released under the Llama 2 Community License, per the Llama 2 Community License Agreement. Since I fine-tuned them model on OpenAI generated data that I generated, this model is released for research purposes only. I have licensed the associated llama_functions dataset under the Creative Commons' Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license. Whether you may use that data to train your own models is your responsibility to determine.

Bias, Risks, and Limitations

No additional bias beyond that of the underlying model Llama-2-7b-chat-hf and those introduced by the Guanaco subset of the OASST1 dataset.

This model can hallucinate function calls that do not exist in the system prompt. While I hope to improve this by iterating on the llama_functions dataset, the 7B model will likely continue to struggle with this. I'm hoping to see more accuracy and less hallucination in larger models and plan to experiment with inference strategies, such as grammar-based sampling and classifier-based routing, to improve performance in smaller models.

At the very minimum, I encourage you to validate outputs before attempting to use responses to call any functions. For example, several people have found Pydantic to be a convenient way to both describe functions and validate calls prior to execution.

Training Details

Training Data

See the llama_functions dataset for more information.

Training Procedure

Coming soon

Training Hyperparameters

Coming soon

Sizes

11B & 70B chat and non-chat versions coming soon

Evaluation

Coming soon

Citation

@misc{LlamaFunctions,
  title = {LlamaFunctions: An Open Dataset of Structured API Calls From Natural Language Prompts},
  author = {Marc Love},
  year = {2023},
  publisher = {HuggingFace},
  journal = {HuggingFace repository},
  howpublished = {\url{https://https://huggingface.co/marclove/llama_functions},
}