🔍 UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture

ArtiMuse: Fine-Grained Image Aesthetics Assessment with Joint Scoring and Expert-Level Understanding

书生·妙析多模态美学理解大模型

Shuo Cao^1,2 Nan Ma³ Jiayang Li⁴ Xiaohui Li^2,5 Lihao Shao³ Kaiwen Zhu^2,5 Yu Zhou⁶ Yuandong Pu^2,5 Jiarui Wu⁷ Jiaquan Wang⁸ Bo Qu² Wenhai Wang^2,7 Yu Qiao² Dajuin Yao^3† Yihao Liu^2†

(† corresponding authors)

¹ University of Science and Technology of China   ² Shanghai AI Laboratory   ³ China Academy of Art
⁴ Peking University   ⁵ Shanghai Jiao Tong University   ⁶ Sun Yat-sen University
⁷ The Chinese University of Hong Kong   ⁸ Hong Kong Polytechnic University

Paper Code

Checkpoints

ArtiMuse-10K Dataset

📰 News & Updates

🔥

Dec 29, 2025

Building upon ArtiMuse, we introduce UniPercept, a comprehensive follow-up work providing a meticulous study on perceptual-level image understanding (IAA, IQA, ISTA).

[Technical Report] [Project Page] [UniPercept-Bench] [UniPercept Model]
🚀

Dec 29, 2025

The test set of the ArtiMuse-10K Dataset is now available! 🚀
🚀

Sep 3, 2025

The Checkpoints and Evaluation Code of ArtiMuse are now available! 🚀
🚀

July 28, 2025

ArtiMuse was officially released at WAIC 2025, in the forum "Evolving with AI: The Iteration and Resilience of Artistic Creativity".
🚀

July 24, 2025

The Online Demo is now open for public access!
🚀

July 21, 2025

The Paper and Project Page are now live!

Online Demo

👉 Try the Online Demo Now

Abstract

The rapid advancement of educational applications, artistic creation, and AI-generated content (AIGC) technologies has substantially increased practical requirements for comprehensive Image Aesthetics Assessment (IAA), particularly demanding methods capable of delivering both quantitative scoring and professional understanding. Multimodal Large Language Model (MLLM)-based IAA methods demonstrate stronger perceptual and generalization capabilities compared to traditional approaches, yet they suffer from modality bias (score-only or text-only) and lack fine-grained attribute decomposition, thereby failing to support further aesthetic assessment. In this paper, we present: (1) ArtiMuse, an innovative MLLM-based IAA model with Joint Scoring and Expert-Level Understanding capabilities; (2) ArtiMuse-10K, the first expert-curated image aesthetic dataset comprising 10,000 images spanning 5 main categories and 15 subcategories, each annotated by professional experts with 8-dimensional attributes analysis and a holistic score. Both the model and dataset will be made public to advance the field.

Overview of ArtiMuse

Overview of ArtiMuse. ArtiMuse provides granular, expert-level textual understanding results for images across eight fine-grained aesthetic attributes. Additionally, it achieves precise image aesthetics scoring, significantly outperforming state-of-the-art models across multiple widely-used benchmarks.

Comparison with Existing Models

Aesthetic Analysis

Aesthetics Scoring

Model	AVA		PARA		TAD66K		FLICKR-AES		ArtiMuse-10K
Model	SRCC	PLCC	SRCC	PLCC	SRCC	PLCC	SRCC	PLCC	SRCC	PLCC
Traditional Models
MUSIQ	0.225	0.258	0.490	0.600	0.099	0.149	0.150	0.216	-0.060	-0.074
TANet	0.758	0.765	–	–	0.513	0.531	–	–	–	–
VILA	0.776	0.775	0.651	0.658	0.418	0.444	0.616	0.645	0.273	0.268
AesMamba	0.774	0.769	0.936	0.902	0.511	0.483	–	–	–	–
MLLMs for General-Purpose Applications
mPLUG-Owl2	0.206	0.211	0.376	0.372	0.089	0.106	0.382	0.359	0.159	0.145
ShareGPT-4V	0.213	0.199	0.509	0.417	0.097	0.091	0.335	0.289	0.076	0.057
Qwen-2.5-VL-7B	0.391	0.371	0.721	0.743	0.240	0.242	0.621	0.578	0.256	0.179
Qwen-2.5-VL-72B-instruct	0.408	0.387	0.727	0.763	0.232	0.235	0.626	0.589	0.233	0.197
InternVL3-8B	0.364	0.332	0.667	0.693	0.203	0.191	0.553	0.459	0.187	0.157
InternVL3-78B	0.385	0.344	0.666	0.694	0.221	0.220	0.518	0.433	0.223	0.206
GPT-4o	0.509	0.485	0.697	0.744	0.278	0.282	0.605	0.597	0.333	0.276
Gemini-2.0-flash	0.474	0.457	0.703	0.704	0.319	0.323	0.658	0.651	0.286	0.265
MLLMs for Image Aesthetics Assessment
Q-Instruct	0.318	0.338	0.569	0.724	0.122	0.159	0.259	0.299	-0.045	-0.056
PEAS	0.748	0.748	0.686	0.700	0.415	0.444	0.577	0.613	0.306	0.293
Q-Align	0.822	0.817	0.913	0.888	0.501	0.531	0.798	0.818	0.551	0.573
UNIAA-LLaVA	0.713	0.704	0.864	0.895	0.411	0.425	0.724	0.751	–	–
Next Token Is Enough	0.828	0.825	–	–	0.413	0.444	–	–	–	–
ArtiMuse	0.827	0.826	0.936	0.958	0.510	0.543	0.814	0.837	0.614	0.627

In comparison with existing models, ArtiMuse outperforms them by simultaneously achieving both accurate aesthetic analysis and precise aesthetics scoring in multi-dimensional assessments.

Results on Real‑world Images

Aesthetics Score

87 / 100

ArtiMuse-10K Dataset

We construct ArtiMuse-10K, a high-quality dataset comprising 10,000 carefully curated images spanning 5 primary categories: Graphic Design, 3D Design, AIGC-generated images, Photography, and Painting & Calligraphy. These categories are subdivided into 15 distinct subcategories, such as Chinese Painting, Sculpture, and Daily Photography, ensuring comprehensive representation of diverse artistic expressions. Each image is annotated by professional experts on eight aesthetic attributes and an overall aesthetics score, offering superior professional rigor and annotation granularity. ArtiMuse-10K far exceeds existing IAA datasets in diversity and granularity.

Data Curation and Model Training

Pipeline of ArtiMuse. ArtiMuse encompasses a multi-stage pipeline spanning data collection & processing, annotation generation, and model training, systematically enhancing its text evaluation capabilities and score assessment proficiency across multiple dimensions.

BibTeX

If you find our work useful, please consider citing our paper:

@misc{cao2025uniperceptunifiedperceptuallevelimage,
      title={UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture}, 
      author={Shuo Cao and Jiayang Li and Xiaohui Li and Yuandong Pu and Kaiwen Zhu and Yuanting Gao and Siqi Luo and Yi Xin and Qi Qin and Yu Zhou and Xiangyu Chen and Wenlong Zhang and Bin Fu and Yu Qiao and Yihao Liu},
      year={2025},
      eprint={2512.21675},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.21675}, 
}

@misc{cao2025artimusefinegrainedimageaesthetics,
      title={ArtiMuse: Fine-Grained Image Aesthetics Assessment with Joint Scoring and Expert-Level Understanding}, 
      author={Shuo Cao and Nan Ma and Jiayang Li and Xiaohui Li and Lihao Shao and Kaiwen Zhu and Yu Zhou and Yuandong Pu and Jiarui Wu and Jiaquan Wang and Bo Qu and Wenhai Wang and Yu Qiao and Dajuin Yao and Yihao Liu},
      year={2025},
      eprint={2507.14533},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.14533}, 
}