YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
ITFormer
This repository provides the official open-source implementation of ITFormer (Instruct Time Transformer), a novel framework for temporal-textual multimodal question answering (QA).
Overview
ITFormer (Instruct Time Transformer) is a state-of-the-art model for temporal-textual multimodal question answering. This repository provides the official open-source implementation with inference and training scripts.
Our work introduces a large-scale multitask dataset (EngineMT-QA) and demonstrates ITFormer's superior performance in bridging time series data with natural language understanding. Remarkably, our 0.5B model is lightweight and efficient while achieving strong performance.
Features
- π Pre-trained Models: Ready-to-use ITFormer models (0.5B, 3B, 7B) available on Hugging Face.
- π Lightweight & Efficient: The 0.5B model offers strong temporal QA capabilities and easy deployment.
- π― One-Click Scripts: Automated scripts for pre-training, SFT, and parallel inference.
- π High Performance: State-of-the-art results on temporal-textual QA benchmarks.
- π Distributed Support: Fully compatible with
acceleratefor multi-GPU training and inference.
Quick Start
1. Organize Directory Structure
After downloading models and datasets, organize your files as follows:
ITFormer-ICML25/
βββ dataset/
β βββ datasets/ # Place EngineMT-QA dataset files here
β βββ time_series_data.h5
β βββ train_qa.jsonl
β βββ test_qa.jsonl
βββ LLM/ # Base Qwen2.5-Instruct models
βββ checkpoints/
β βββ ITFormer-0.5B/ # ITFormer model checkpoints
βββ scripts/ # One-click automation scripts
β βββ run_pretrain.sh
β βββ run_sft.sh
β βββ run_inference.sh
βββ accelerate_config.yaml # Configuration for distributed execution
βββ yaml/
βββ infer.yaml # Inference configuration
2. Run Inference
We now support parallel inference using accelerate. This automatically aggregates results from multiple GPUs.
# Using the automated script (Recommended)
bash scripts/run_inference.sh
# Or launch manually via accelerate
accelerate launch --config_file accelerate_config.yaml inference.py --config yaml/infer.yaml
The inference script will:
- Load ITFormer and the corresponding Qwen2.5-Instruct.
- Distribute data across all available GPUs.
- Aggregate and save results to
inference_results/andoutput_result_all.json.
Training
We provide a streamlined training pipeline using accelerate. Ensure your accelerate_config.yaml is properly configured for your hardware.
A. Pre-training (Time-Series Encoder)
Stage A focuses on pre-training the TimeSeriesEncoder using masked modeling.
# One-click pre-training
bash scripts/run_pretrain.sh
B. Supervised Fine-Tuning (SFT)
Stage B performs end-to-end SFT, bridging the time-series encoder with the LLM via ITFormer.
# One-click SFT (Requires pre-trained ts_encoder weights)
bash scripts/run_sft.sh
Key Parameters in SFT:
--it_d_model,--it_n_heads,--it_layers: Configuration for the ITFormer module.--load_ts_encoder: Path to the weights generated in Stage A.--llm_model_path: Path to the base Qwen2.5-Instruct model.
Model Architecture
ITFormer leverages an Instruction-aware Time Series Transformer to align temporal features with textual queries before feeding them into a Large Language Model. The framework is designed to be parameter-efficient, freezing the LLM and TS Encoder during SFT while training only the ITFormer and projection layers.
Citation
If you use this code in your research, please cite:
@inproceedings{wang2025itformer,
title={ITFormer: Bridging Time Series and Natural Language for Multi-Modal QA with Large-Scale Multitask Dataset},
author={Yilin Wang and Peixuan Lei and Jie Song and Yuzhe Hao and Tao Chen and Yuxuan Zhang and Lei Jia and Yuanxiang Li and Zhongyu Wei},
booktitle={International Conference on Machine Learning (ICML)},
year={2025}
}
License
MIT License β see the LICENSE file for details.