



Despite the significant progress made by all-in-one models in universal image restoration, existing methods suffer from a generalization bottleneck in real-world scenarios, as they are mostly trained on small-scale synthetic datasets with limited degradations. Therefore, large-scale high-quality real-world training data is urgently needed to facilitate the emergence of foundation models for image restoration.
To advance this field, we spare no effort in contributing a million-scale dataset with two notable advantages over existing training data: larger-scale real-world samples, and higher-diversity data types. By adjusting internal camera settings and external imaging conditions, we can capture aligned image pairs using our well-designed data acquisition system over multiple rounds and our data alignment criterion.
Moreover, we propose a robust model, FoundIR, to better address a broader range of restoration tasks in real-world scenarios, taking a further step toward foundation models. Specifically, we first utilize a diffusion-based generalist model to remove degradations by learning the degradation-agnostic common representations from diverse inputs, where incremental learning strategy is adopted to better guide model training. To refine the model's restoration capability in complex scenarios, we introduce degradation-aware specialist models for achieving final high-quality results. Extensive experiments show the value of our dataset and the effectiveness of our method.
Illustration of the proposed FoundIR. We first employ a diffusion-based generalist model $\mathcal{G}$ for degradation removal, followed by multiple specialist models $\mathcal{S}$ for quality refinement. We guide the generalist model to learn a degradation-agnostic common representation space from various degraded inputs, where incremental learning is introduced to improve the model's training stability. For the specialist models, we construct an expert pool to handle various scenarios, comprising text repair experts, weather experts, and illumination experts.
Example LQ-GT paired images in the proposed million-scale dataset. Compared to existing training data, the proposed dataset offer twofold advantages: (i) real-world scenarios with larger-scale, and (ii) degradation types with higher-diversity.
@inproceedings{li2024foundir,
title={FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration},
author={Li, Hao and Chen, Xiang and Dong, Jiangxin and Tang, Jinhui and Pan, Jinshan},
booktitle={ICCV},
year={2025}
}