You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Super-Linear: A Mixture of Experts Time Series Forecasting Model

SuperLinear is a novel time series forecasting model that employs a Mixture of Experts (MoE) architecture to achieve superior performance across various forecasting tasks. The model routes inputs to the most relevant experts based on frequency-domain analysis using FFT-based gating networks.

Model Architecture

The SuperLinear model consists of:

  • Sparse Mixture of Experts (MoE): Routes inputs to the top-k most relevant experts
  • FFT-based Gating Network: Uses frequency domain analysis to determine expert routing
  • Frequency-specific Experts: Pre-trained experts specialized for different temporal patterns

Key Features

  • Adaptive Expert Selection: Dynamic routing based on input characteristics
  • Frequency-aware Processing: Leverages FFT analysis for intelligent expert selection
  • Auto-regressive Capabilities: Supports long-horizon forecasting
  • Multi-scale Processing: Handles various sequence lengths through resampling

Usage

from transformers import AutoModelForCausalLM, AutoConfig
import torch

# Load the model
model = AutoModelForCausalLM.from_pretrained("SequentialLearning/SuperLinear", trust_remote_code=True)

# Prepare input time series data
# Shape: [batch_size, channel, sequence_length] or [batch_size, sequence_length]
input_data = torch.randn(1, 1, 512)

# Generate predictions
with torch.no_grad():
    outputs = model(inputs_embeds=input_data, pred_len=96, get_prob = True)
    preds = outputs.logits # Predicted values
    probs = outputs.attentions  # Expert probabilities stored here
  

Configuration

Key parameters:

  • train_seq_len: Training sequence length (default: 512)
  • train_pred_len: Training prediction length (default: 96)
  • top_k_experts: Number of experts to use (default: 12)
  • use_fft: Whether to use FFT-based gating (default: True)
  • freq_experts: Frequency-specific expert configuration
  • moe_temp: Temperature for expert selection during inference (default: 1)

Links

Citation

If you use SuperLinear in your research, please cite:

@article{nochumsohn2025super,
  title={Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting},
  author={Nochumsohn, Liran and Marshanski, Raz and Zisling, Hedi and Azencot, Omri},
  journal={arXiv preprint arXiv:2509.15105},
  year={2025}
}

License

This model is released under the MIT License.

Downloads last month
238
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support