Safetensors
English
llama

CALM-8B: Conversational Agentic Language Model

Made with Oumi

Model Description

CALM-8B is the smallest open-source model of CALM (Conversational Agentic Language Model) series, designed to integrate both Task-Oriented Dialogue (TOD) capabilities and Language Agent (LA) functionalities into a unified system. By fine-tuning on CALM-IT, a novel dataset that interleaves multi-turn ReAct-based reasoning with complex API usage, CALM-8B achieves promising results on TOD and function-calling benchmarks.

CALM-8B is trained on a multi-task dataset covering dialogue state tracking, function calling, and multi-turn reasoning. The model outperforms top domain-specific models on key evaluation benchmarks: MultiWOZ 2.4 (TOD), BFCL V3 (LA), and API-Bank (LA).

Model Sources


Model Details

  • Model Name: CALM-8B
  • Developed by: Colloboration of UIUC Conversational AI LAB and Oumi
  • License: cc-by-nc-4.0
  • Architecture: Fine-tuned Llama 3.1 8B Instruct
  • Training Data: CALM-IT dataset
  • Fine-tuning Framework: Oumi
  • Training Hardware: 8 NVIDIA H100 GPUs
  • Training Duration: ~8 hours
  • Evaluation Benchmarks: MultiWOZ 2.4, BFCL V3, API-Bank
  • Release Date: February 5, 2025

Capabilities and Features

πŸ—£ Conversational Agentic Abilities

  • Multi-turn Dialogue Mastery: Maintains coherent conversations across multiple turns with accurate state tracking.
  • Function Calling and API Integration: Dynamically selects and calls APIs for task execution.
  • ReAct-based Reasoning: Utilizes a structured reasoning process (User-Thought-Action-Observation-Thought-Response).
  • Zero-Shot Generalization: Excels in previously unseen function-calling tasks.

πŸš€ Benchmark Performance

  • MultiWOZ 2.4 (TOD): Excels in dialogue state tracking and task completion.
  • BFCL V3 (LA): Demonstrates superior function-calling abilities over language agents.
  • API-Bank (LA): Accurately generates API calls and integrates responses into conversation flow.

Training Process

πŸ”§ Fine-tuning Stages

  1. TOD Fine-tuning: Optimized for dialogue state tracking (e.g., augmented SNIPS reformatted in Alpaca-style instruction tuning).
  2. Function Calling Fine-tuning: Trained to select and generate well-formed API calls from LA datasets.
  3. ReAct-based Fine-tuning: Addresses multi-turn conversations with API integration using a structured reasoning framework.

πŸ” Training Hyperparameters

  • Base Model: Llama 3.1 8B Instruct
  • LoRA Config: Rank = 16, Scaling Factor = 32
  • Batch Size: 8
  • Learning Rate: 1e-4
  • Optimizer: AdamW (betas = 0.9, 0.999, epsilon = 1e-8)
  • Precision: Mixed precision (bfloat16)
  • Warm-up Steps: 0.1 ratio of total steps
  • Gradient Accumulation Steps: 1

πŸ’‘ CALM-IT Dataset

CALM-IT Dataset Statistics

πŸ“Š Benchmark Performance

CALM-IT Dataset Statistics ---

Usage

πŸ— How to Load the Model using Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("uiuc-convai/CALM-8B")
model = AutoModelForCausalLM.from_pretrained("uiuc-convai/CALM-8B")

πŸ›  Example Oumi Inference

pip install oumi

# See oumi_infer.yaml in this model's /oumi/ directory.
oumi infer -i -c ./oumi_infer.yaml

πŸ›  Example Oumi Fine-Tuning

pip install oumi

# See oumi_train.yaml in this model's /oumi/ directory.
oumi train -c ./oumi_train.yaml

  • Task-Specific Calibration: While CALM-8B generalizes well across tasks, performance can improve with domain-specific fine-tuning.
  • Scalability to Larger Models: Future iterations (CALM-70B, CALM-405B) extend capabilities to larger-scale agentic conversations.
  • Open-Source Expansion: All datasets, training scripts, and model checkpoints are publicly available to foster further research.

Acknowledgements

We'd like to thank the Oumi AI Team for collaborating on training the models, as well as Together AI for providing the compute resources necessary to train CALM 405B.

License

This model is licensed under Creative Commons NonCommercial (CC BY-NC 4.0).


Citation

If you use CALM-8B in your research, please cite:

@misc{acikgoz2025singlemodelmastermultiturn,
      title={Can a Single Model Master Both Multi-turn Conversations and Tool Use? CALM: A Unified Conversational Agentic Language Model}, 
      author={Emre Can Acikgoz and Jeremiah Greer and Akul Datta and Ze Yang and William Zeng and Oussama Elachqar and Emmanouil Koukoumidis and Dilek Hakkani-TΓΌr and Gokhan Tur},
      year={2025},
      eprint={2502.08820},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2502.08820}, 
}

For more details, visit Project Repository or contact acikgoz2@illinois.edu.

Downloads last month
27
Safetensors
Model size
8.03B params
Tensor type
BF16
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for uiuc-convai/CALM-8B

Finetuned
(857)
this model
Quantizations
2 models

Collection including uiuc-convai/CALM-8B