Safetensors
thales_quant
finance
fintech
sparse-autoencoders
xai

Model Card for Thales

Model Overview

Thales is a deep learning architecture developed for quantitative option pricing. It constraints deep neural representations using fundamental financial partial differential equations (PDEs). To ensure compliance with institutional model validation protocols, Thales projects dense latent activations into disentangled representations via a Sparse Autoencoder, which are subsequently decoded into deterministic natural language risk narratives using a fine-tuned Large Language Model.

1. Architecture details

The Thales architecture addresses the trade-off between computational throughput and model interpretability through three structural paradigms:

  • Physics-Informed Optimization: The training objective diverges from standard empirical regression. The loss function is parameterized to optimize the governing PDEs of financial derivatives, implicitly enforcing structural adherence to market dynamics and Greek sensitivities.
  • Latent Space Disentanglement: An embedded Sparse Autoencoder (SAE) acts as a bottleneck layer, regularizing dense network activations into discrete, attribution-friendly feature clusters.
  • Forward-Pass Surrogate: The model replaces recursive numerical methods (e.g., finite-difference PDE solvers, Monte Carlo simulations) with a vectorized forward pass, reducing inference latency to bounded matrix multiplications.

2. Training Methodology

Thales is trained using specific optimization strategies to maintain numerical stability across disparate moneyness and maturity regimes:

  • Orthogonalized Momentum Optimization (Muon): Parameter spaces are partitioned based on tensor dimensionality. 2D parameters (convolutional and linear weights) are updated using Newton-Schulz iteration (Muon) to preserve orthogonal weight updates and constrain Lipschitz constants. 1D parameters (biases, normalizations) utilize standard AdamW.
  • Sobolev Regularization: The objective function integrates a Sobolev loss penalty. Using exact automatic differentiation, the analytical gradient of the price with respect to the underlying asset ($\frac{\partial V}{\partial S}$, Delta) is computed dynamically. The loss incorporates Mean Squared Error (MSE) for pricing and Mean Absolute Error (MAE) for Delta, constraining the derivative manifold.
  • Arbitrage-Free Constraints: Dynamic normalization is applied to input parameters ($S, K, T, r$) preserving computational graph integrity. A Softplus terminal activation function enforces strictly positive pricing, structurally bounding the output to prevent trivial arbitrage conditions.
  • Gradient Stabilization: Global gradient norm clipping is applied to prevent numerical divergence when regressing deep out-of-the-money (OTM) implied volatility gradients.

3. Interpretability and Semantic Decoding

A documented limitation of deep parametric models in finance is the opacity of feature attribution. Thales applies a specific latent constraint methodology to generate auditable risk outputs.

SNR-Bounded Sparsity Paradigm

Standard language model SAEs typically target >95% sparsity. Thales targets a "Dense-Sparse Hybrid" equilibrium, empirically maintained at ~50% structural sparsity. This hyperparameter is derived from the fundamentally low Signal-to-Noise Ratio (SNR) in financial micro-structures.

Macro-level absolute variance accounts for the majority of spatial energy, whereas local geometry (skewness, curvature) exhibits low-amplitude signals. Enforcing extreme sparsity induces manifold collapse, degrading Greek precision. The ~50% sparsity threshold acts as a low-pass structural filter, preserving baseline volatility representations while discretizing Moneyness deformations.

Semantic LLM Decoding

To transition from numerical latent vectors to auditable risk reports, Thales employs a post-trained Large Language Model (fine-tuned on the Thales_Instruction_Dataset) as a deterministic decoder. High-dimensional SAE activation states are projected into the LLM context window. The system maps specific mathematical activation vectors to structural risk diagnostics.

Example Output:

SAE cluster [42, 118, 503] active. Indicator: Short-term OTM put skewness expansion corresponding to a SABR Rho parameter contraction. ATM baseline volatility remains static.

4. Environmental, Social, and Governance (ESG) & Carbon Footprint

The deployment of quantitative models presents significant computational sustainability challenges. Traditional risk management requires re-evaluating massive portfolios using grid-based PDE solvers or large-scale Monte Carlo methods, yielding substantial Scope 3 emissions via data center power consumption.

Thales systematically front-loads computational cost to the training phase (which is a one-time carbon expenditure), serving as a sustainable surrogate model during inference.

  • Inference Carbon Intensity: By reducing algorithmic time complexity from $O(N \times \text{paths})$ (Monte Carlo) to $O(1)$ tensor multiplications per batch, Thales reduces the Joules-per-inference metric by orders of magnitude compared to traditional CPU-bound risk engines.
  • Hardware Efficiency: The architecture enables high GPU utilization (exceeding 2.5 million options per second at high batch regimes). This parallelization density allows financial institutions to downscale the physical server footprint required for overnight risk scenario generation (e.g., CCAR, FRTB compliance), directly contributing to institutional Net Zero and carbon neutrality mandates.

Showcase

5. Implementation & Usage

Dependencies: torch, safetensors, huggingface_hub.

import torch
from safetensors.torch import load_model
from model import ThalesModel

model = ThalesModel(grid_size=11, sae_dim=1024)

from huggingface_hub import hf_hub_download
weights_path = hf_hub_download(repo_id="Chunjiang-Intelligence/Thales", filename="model.safetensors")
load_model(model, weights_path)
model.eval()

dummy_surface = torch.randn(1, 2, 11, 11)
dummy_scalars = torch.tensor([[100.0, 100.0, 1.0, 0.03]])

with torch.no_grad():
    price, _, cnn_feat, recon, sae_acts = model(dummy_surface, dummy_scalars, return_acts=True)

print(f"Predicted Option Price: {price.item():.4f}")
print(f"Primary Active SAE Node: {torch.argmax(sae_acts).item()}")

6. Evaluation and Benchmarks

Benchmarking was conducted on an out-of-sample synthetically generated dataset comprising SABR and SVI surfaces. Measurements were taken in an isolated NVIDIA CUDA environment (fp32 precision).

Batch Size Batch Latency (ms) Per-Option Latency (μs) Throughput (opts/sec) SAE Neurons Sparsity
1 0.766 765.68 1,306 48.83%
16 0.548 34.25 29,201 50.71%
64 0.542 8.47 117,998 50.90%
256 0.622 2.43 411,723 50.68%
1024 0.618 0.60 1,658,165 50.72%
4096 1.577 0.39 2,596,696 50.68%

The benchmark demonstrates sub-linear latency scaling relative to batch dimension. Single-batch ($N=1$) inference is strictly bounded by CUDA kernel launch overhead (~0.76 ms). At a saturation batch size of 4096, the system achieves a throughput of ~2.59M options/second, with per-option latency converging to 0.39 μs. Throughout the scaling regimes, SAE structural sparsity remains strictly bounded at $\approx 50%$, confirming the stability of the latent representation irrespective of computational load.

7. Limitations and Citation

Thales is constrained by the distribution of its training data. Its surrogate accuracy degrades gracefully when extrapolating beyond the volatility surface parameter boundaries observed during training. The current iteration does not explicitly model jump-diffusion processes or discrete dividend schedules.

If utilizing the Thales architecture in published research or enterprise environments, please cite:

@misc{thales2026,
  author = {Chunjiang Intelligence},
  title = {Thales: Interpretable and Physics-Informed Deep Learning for Quantitative Option Pricing},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Chunjiang-Intelligence/Thales}}
}
Downloads last month
72
Safetensors
Model size
471k params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Chunjiang-Intelligence/Thales

Collection including Chunjiang-Intelligence/Thales