LUCID-CC0 v2 High Complexity (256Γ—256): Finetuning Dataset for SISR

A high-complexity subset of lucid-cc0-v2 containing only the most detailed, information-rich tiles. Designed for finetuning pretrained SISR models to push final quality metrics.

Overview

Property Value
Source lucid-cc0-v2 (filtered subset)
Filtering ICNet complexity β‰₯ 0.85 (highest-detail tiles only)
Tile size 256Γ—256 pixels
Mean complexity ~0.917
Total tiles ~271,000
Disk size ~35 GB
License CC0-1.0 (public domain)

Intended Use

This dataset is designed for finetuning SISR models that have been pretrained on the full lucid-cc0-v2 dataset. The high-complexity tiles contain the sharpest edges, finest textures, and most intricate details β€” exactly what a model needs to refine its super-resolution capabilities.

Recommended Training Strategy

Stage Dataset Purpose
1. Pretrain from scratch lucid-cc0-v2 (200GB) Learn general image representations
2. Finetune This dataset (27GB) Refine on highest-quality tiles
3. Finetune-finetune lucid-cc0-v2-hc-512 (512Γ—512) Push quality with max patch size

Why High Complexity?

  • Transformer models (HAT, SwinIR) are data-hungry but also benefit from quality-focused finetuning
  • High-complexity tiles contain more high-frequency information per pixel
  • Finetuning on these tiles specifically improves texture recovery and edge sharpness
  • The 256Γ—256 patch size is compatible with most SISR training frameworks

Dataset Structure

lucid-cc0-v2-hc/
β”œβ”€β”€ train/
β”‚   β”œβ”€β”€ 000/
β”‚   β”‚   β”œβ”€β”€ 00000.png
β”‚   β”‚   └── ...
β”‚   β”œβ”€β”€ 001/
β”‚   └── ...
β”œβ”€β”€ LR/
β”‚   β”œβ”€β”€ x2/           # Bicubic downscaled Γ—2
β”‚   └── x4/           # Bicubic downscaled Γ—4
└── metrics.csv       # Per-image complexity statistics

Filtering Criteria

Tiles were selected from lucid-cc0-v2 based on ICNet complexity score:

  • Threshold: β‰₯ 0.85 (out of 1.0)
  • Distribution: Mean 0.917, all tiles above 0.85
  • What this captures: Images with strong edges, fine textures, intricate patterns, high local contrast
  • What this excludes: Smooth gradients, blurry regions, low-detail areas, sky/water surfaces

Bicubic Downscaling

LR images are provided using MATLAB-compatible bicubic interpolation, matching the standard used in SISR benchmarks.

Scale factors: Γ—2 and Γ—4.

Citation

@dataset{lucid_cc0_v2_hc,
  title={LUCID-CC0 v2 High Complexity: Finetuning Dataset for SISR},
  author={Phips},
  year={2026},
  license={CC0-1.0},
  url={https://huggingface.co/datasets/Phips/lucid-cc0-v2-hc}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support