Instructions to use TimeOmni-VL/TimeOmni-VL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TimeOmni-VL/TimeOmni-VL with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-to-image", model="TimeOmni-VL/TimeOmni-VL", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("TimeOmni-VL/TimeOmni-VL", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
๐ TimeOmni-VL: Unified Models for Time Series Understanding and Generation
We present TimeOmni-VL, a unified multimodal model for time series understanding and generation. It first builds a fidelity-preserving bidirectional Time Series โ Image interface for near-lossless conversion between numerical sequences and TS-images. Then, temporal reasoning from TS-image understanding is used as an explicit condition to guide generation. Experiments show that TimeOmni-VL achieves top-tier forecasting performance and state-of-the-art imputation results. Finally, these findings support an โunderstanding-guided generationโ paradigm for future multimodal time series models.
๐จ Task Illustration
TimeOmni-VL is designed for two complementary task families:
- Time series understanding: answer questions about TS-images, including variable counting, variable localization, cycle localization, mean comparison, anomaly detection, and trend analysis.
- Time series generation: generate missing or future values for multivariate time series, including zero-shot forecasting and zero-shot imputation.
๐ง Method
TimeOmni-VL follows a unified understand-then-generate pipeline. The input time series is first transformed into a TS-image by the TS2I converter. For understanding tasks, the model directly produces a reasoning chain and the final textual answer from the TS-image. For generation tasks, the model first generates temporal reasoning as a condition, then uses it to guide the generation module to complete the target TS-image. The generated TS-image is finally decoded back into numerical time series by the I2TS converter.
๐ Benchmarks
* Note: For forecasting and imputation, lower nMASE is better. โโโ indicates that the success rate is below 10%, so the result is not reported due to insufficient statistical reliability.
Table 1. Forecasting Performance on GIFT-Eval Subset
| Method | Short-term nMASEโ | Medium-term nMASEโ | Long-term nMASEโ |
|---|---|---|---|
| LLMs | |||
| Gemini-2.5-Flash | 1.295 | 1.201 | 1.279 |
| Qwen2.5-Instruct-7B | 1.445 | โ | โ |
| Time Series-based Models | |||
| ChatTime | 0.983 | 1.439 | 4.164 |
| Time-R1 | 1.162 | โ | โ |
| TimeOmni-1 | 1.298 | โ | โ |
| Image-based Models | |||
| VisionTS++ | 0.915 | 0.682 | 0.690 |
| VisionTS | 1.263 | 0.763 | 0.794 |
| Bagel | 16.303 | 17.840 | 16.530 |
| TimeOmni-VL | 0.878 | 0.816 | 0.784 |
Table 2. Imputation Performance under Different Masking Ratios
| Method | [0.1, 0.2) nMASEโ | [0.2, 0.3) nMASEโ | [0.3, 0.4) nMASEโ | [0.4, 0.5] nMASEโ |
|---|---|---|---|---|
| LLMs | ||||
| Gemini-2.5-Flash | 0.920 | 2.028 | 2.434 | 1.160 |
| Qwen2.5-Instruct-7B | 4.878 | 1.854 | โ | โ |
| Statistics Baselines | ||||
| Nearest | 0.975 | 0.958 | 1.003 | 0.929 |
| Linear | 0.943 | 0.905 | 0.965 | 0.968 |
| Time Series-based Models | ||||
| Moment-large | 1.220 | 1.400 | 1.630 | 2.100 |
| Moment-base | 1.510 | 1.600 | 1.700 | 2.130 |
| Image-based Models | ||||
| Bagel | 17.411 | 12.239 | 11.849 | 11.032 |
| TimeOmni-VL | 0.713 | 0.757 | 0.842 | 0.927 |
Table 3. TS-image Understanding Performance
* Note: Scores are normalized to [0, 1], where higher is better. QA1โQA3 are layout-level tasks, and QA4โQA6 are signal-level tasks. Bold marks the best value in each column.
| Method | Layout Tasks | Signal Tasks | ||||
|---|---|---|---|---|---|---|
| QA1โ | QA2โ | QA3โ | QA4โ | QA5โ | QA6โ | |
| Proprietary VLMs | ||||||
| Gemini-2.5-Flash | 0.540 | 0.640 | 0.004 | 0.535 | 0.000 | 0.342 |
| Gemini-2.0-Flash | 0.230 | 0.290 | 0.261 | 0.279 | 0.000 | 0.220 |
| Base Model | ||||||
| Bagel | 0.000 | 0.502 | 0.012 | 0.182 | 0.000 | 0.254 |
| Ours | ||||||
| TimeOmni-VL | 1.000 | 1.000 | 0.931 | 1.000 | 0.667 | 0.841 |
Table 4. Time Series Reasoning Performance
* Note: Task 1, Task 2, and Task 4 use accuracy, where higher is better. Task 3 uses MAE, where lower is better. Bold marks the best result in each column.
| Method | Perception | Extrapolation | Decision Making | |
|---|---|---|---|---|
| Task1โ | Task2โ | Task3โ | Task4โ | |
| LLMs | ||||
| Gemini-2.5-Flash | 77.5 | 25.9 | 170.78 | 36.6 |
| Qwen2.5-Instruct-7B | 42.8 | 26.3 | 146.12 | 24.9 |
| TSLMs | ||||
| Time-MQA-8B | 25.1 | 31.2 | - | 11.6 |
| ChatTS | 39.2 | 18.6 | - | 11.1 |
| ITFormer | 47.5 | 14.6 | 230.04 | 41.7 |
| Time-R1 | 34.0 | 31.4 | 160.47 | 32.2 |
| TimeOmni-1 | 87.7 | 64.0 | 145.53 | 58.9 |
| Ours | ||||
| TimeOmni-VL | 84.0 | 61.3 | 163.79 | 61.4 |
๐ Usage
This repository hosts the model weights for TimeOmni-VL. For installation, inference scripts, TS2I/I2TS conversion utilities, and complete examples, please visit our GitHub repository.
License
TimeOmni-VL is released under the Apache 2.0 license.
โ๏ธ Citation
@article{guan2026timeomni,
title={TimeOmni-VL: Unified Models for Time Series Understanding and Generation},
author={Guan, Tong and Pan, Sheng and Barthelemy, Johan and Li, Zhao and Cai, Yujun and Alippi, Cesare and Jin, Ming and Pan, Shirui},
journal={arXiv preprint arXiv:2602.17149},
year={2026}
}
- Downloads last month
- 112