๐Ÿ TimeOmni-VL: Unified Models for Time Series Understanding and Generation

TimeOmni-VL Paper on arXiv TimeOmni-VL Model on Hugging Face TSUMM-Suite Dataset on Hugging Face TimeOmni-VL Demo on Hugging Face Spaces TimeOmni-VL Code on GitHub

We present TimeOmni-VL, a unified multimodal model for time series understanding and generation. It first builds a fidelity-preserving bidirectional Time Series โ‡” Image interface for near-lossless conversion between numerical sequences and TS-images. Then, temporal reasoning from TS-image understanding is used as an explicit condition to guide generation. Experiments show that TimeOmni-VL achieves top-tier forecasting performance and state-of-the-art imputation results. Finally, these findings support an โ€œunderstanding-guided generationโ€ paradigm for future multimodal time series models.

๐ŸŽจ Task Illustration

TimeOmni-VL is designed for two complementary task families:

  • Time series understanding: answer questions about TS-images, including variable counting, variable localization, cycle localization, mean comparison, anomaly detection, and trend analysis.
  • Time series generation: generate missing or future values for multivariate time series, including zero-shot forecasting and zero-shot imputation.

๐Ÿง  Method

TimeOmni-VL follows a unified understand-then-generate pipeline. The input time series is first transformed into a TS-image by the TS2I converter. For understanding tasks, the model directly produces a reasoning chain and the final textual answer from the TS-image. For generation tasks, the model first generates temporal reasoning as a condition, then uses it to guide the generation module to complete the target TS-image. The generated TS-image is finally decoded back into numerical time series by the I2TS converter.

๐Ÿ“Š Benchmarks

* Note: For forecasting and imputation, lower nMASE is better. โ€œโ€“โ€ indicates that the success rate is below 10%, so the result is not reported due to insufficient statistical reliability.

Table 1. Forecasting Performance on GIFT-Eval Subset

Method Short-term nMASEโ†“ Medium-term nMASEโ†“ Long-term nMASEโ†“
LLMs
Gemini-2.5-Flash 1.295 1.201 1.279
Qwen2.5-Instruct-7B 1.445 โ€“ โ€“
Time Series-based Models
ChatTime 0.983 1.439 4.164
Time-R1 1.162 โ€“ โ€“
TimeOmni-1 1.298 โ€“ โ€“
Image-based Models
VisionTS++ 0.915 0.682 0.690
VisionTS 1.263 0.763 0.794
Bagel 16.303 17.840 16.530
TimeOmni-VL 0.878 0.816 0.784

Table 2. Imputation Performance under Different Masking Ratios

Method [0.1, 0.2) nMASEโ†“ [0.2, 0.3) nMASEโ†“ [0.3, 0.4) nMASEโ†“ [0.4, 0.5] nMASEโ†“
LLMs
Gemini-2.5-Flash 0.920 2.028 2.434 1.160
Qwen2.5-Instruct-7B 4.878 1.854 โ€“ โ€“
Statistics Baselines
Nearest 0.975 0.958 1.003 0.929
Linear 0.943 0.905 0.965 0.968
Time Series-based Models
Moment-large 1.220 1.400 1.630 2.100
Moment-base 1.510 1.600 1.700 2.130
Image-based Models
Bagel 17.411 12.239 11.849 11.032
TimeOmni-VL 0.713 0.757 0.842 0.927

Table 3. TS-image Understanding Performance

* Note: Scores are normalized to [0, 1], where higher is better. QA1โ€“QA3 are layout-level tasks, and QA4โ€“QA6 are signal-level tasks. Bold marks the best value in each column.

Method Layout Tasks Signal Tasks
QA1โ†‘ QA2โ†‘ QA3โ†‘ QA4โ†‘ QA5โ†‘ QA6โ†‘
Proprietary VLMs
Gemini-2.5-Flash 0.540 0.640 0.004 0.535 0.000 0.342
Gemini-2.0-Flash 0.230 0.290 0.261 0.279 0.000 0.220
Base Model
Bagel 0.000 0.502 0.012 0.182 0.000 0.254
Ours
TimeOmni-VL 1.000 1.000 0.931 1.000 0.667 0.841

Table 4. Time Series Reasoning Performance

* Note: Task 1, Task 2, and Task 4 use accuracy, where higher is better. Task 3 uses MAE, where lower is better. Bold marks the best result in each column.

Method Perception Extrapolation Decision Making
Task1โ†‘ Task2โ†‘ Task3โ†“ Task4โ†‘
LLMs
Gemini-2.5-Flash 77.5 25.9 170.78 36.6
Qwen2.5-Instruct-7B 42.8 26.3 146.12 24.9
TSLMs
Time-MQA-8B 25.1 31.2 - 11.6
ChatTS 39.2 18.6 - 11.1
ITFormer 47.5 14.6 230.04 41.7
Time-R1 34.0 31.4 160.47 32.2
TimeOmni-1 87.7 64.0 145.53 58.9
Ours
TimeOmni-VL 84.0 61.3 163.79 61.4

๐Ÿš€ Usage

This repository hosts the model weights for TimeOmni-VL. For installation, inference scripts, TS2I/I2TS conversion utilities, and complete examples, please visit our GitHub repository.

License

TimeOmni-VL is released under the Apache 2.0 license.

โœ๏ธ Citation

@article{guan2026timeomni,
  title={TimeOmni-VL: Unified Models for Time Series Understanding and Generation},
  author={Guan, Tong and Pan, Sheng and Barthelemy, Johan and Li, Zhao and Cai, Yujun and Alippi, Cesare and Jin, Ming and Pan, Shirui},
  journal={arXiv preprint arXiv:2602.17149},
  year={2026}
}
Downloads last month
112
Safetensors
Model size
15B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using TimeOmni-VL/TimeOmni-VL 1

Paper for TimeOmni-VL/TimeOmni-VL