Shuo Cao

About Me

I am a Joint PhD student at USTC & Shanghai AI Lab. I received my bachelor's degree from Tongji University in 2023. I previously worked on low-level computer vision, including image restoration and video super-resolution, under the guidance of Prof. Chao Dong.

I am now working with Prof. Yihao Liu and Prof. Yu Qiao on multimodal image understanding and unified models for image generation and understanding, with recent interests in perceptual-level image understanding & assessment, and unified generation & understanding.

News

  • May, 2026 We release StableI2I, a fidelity-oriented evaluation framework for image-to-image transition that diagnoses unintended changes across semantic consistency, structural fidelity, and low-level appearance. [Paper] [Project] [Code] [Checkpoint] [Benchmark]
  • May, 2026 Three papers accepted by ICML'26. UniPercept was selected as a Spotlight paper.
  • Feb, 2026 Two papers accepted by CVPR'26.
  • Jan, 2026 Two papers accepted by ICLR'26.
  • Dec, 2025 We released UniPercept, a unified framework addressing the limitations of current multimodal LLMs in perceptual-level image understanding, specifically across aesthetics, quality, structure, and texture. The release features UniPercept-Bench, a comprehensive benchmark supporting both Visual Rating (VR) and Visual Question Answering (VQA) for IAA/IQA/ISTA tasks, alongside a generalizable baseline model. Beyond evaluation, UniPercept functions as a robust reward model for post-training text-to-image systems and serves as a perceptual diagnostic tool for analyzing datasets and model outputs. [Paper] [Project] [Code] [Checkpoint] [Benchmark]
  • Sep, 2025 We released ArtiMuse, a MLLM for professional aesthetic understanding, which is trained on ArtiMuse-10K, a meticulously curated, expert-annotated dataset. ArtiMuse-10K systematically defines eight explainable and fine-grained aesthetic attributes (e.g., Composition & Design, Visual Elements & Structure, ...), with a wide coverage of diverse visual domains (e.g. Graphic Design, AIGC-generated Images, ...). ArtiMuse was officially released at WAIC 2025, in the forum "Evolving with AI: The Iteration and Resilience of Artistic Creativity". [Online Demo v1.0] [Paper] [Project] [Code] [Checkpoint] [Dataset]

Selected Publications

* equal contribution. ✉ corresponding author.

UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture

ICML 2026 Spotlight

Shuo Cao*, Jiayang Li*, Xiaohui Li, Yuandong Pu, ... Bin Fu, Yu Qiao, Yihao Liu✉

StableI2I: Spotting Unintended Changes in Image-to-Image Transition

ICML 2026

Jiayang Li*, Shuo Cao*, Xiaohui Li, ... Jian Zhang✉, Yihao Liu✉

ArtiMuse: Fine-Grained Image Aesthetics Assessment with Joint Scoring and Expert-Level Understanding

CVPR 2026

Shuo Cao, Nan Ma, Jiayang Li, Xiaohui Li, ... Yu Qiao, Dajuin Yao✉, Yihao Liu✉

Toward Generalizable Deblurring: Leveraging Massive Blur Priors with Linear Attention for Real-World Scenarios

Arxiv 2026

Yuanting Gao*, Shuo Cao*, Xiaohui Li, Yuandong Pu, Yihao Liu✉, Kai Zhang✉

DualX-VSR: Dual Axial Spatial x Temporal Transformer for Real-World Video Super-Resolution without Motion Compensation

Arxiv 2025

Shuo Cao*, Yihao Liu*, Xiaohui Li, Yuanting Gao, Yu Zhou, Chao Dong✉

GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity

ECCV 2024

Shuo Cao*, Yihao Liu*, Wenlong Zhang, Yu Qiao, Chao Dong✉

Accelerating Masked Image Generation by Learning Latent Controlled Dynamics

Arxiv 2026

Kaiwen Zhu, Quansheng Zeng, Yuandong Pu, Shuo Cao, ... Jinjin Gu, Yihao Liu✉

LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution

ICLR 2026

Xiaohui Li, Shaobin Zhuang, Shuo Cao, Yang Yang, ... Bin Fu, Yihao Liu✉

DiffVSR: Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency

ICCV 2025

Xiaohui Li*, Yihao Liu*✉, Shuo Cao, Ziyan Chen, ... Yi Wang, Yu Qiao