ECCV 2026

Monocular Avatar Reconstruction via Cascaded Diffusion Priors and UV-Space Differentiable Shading

MARCUS-Avatar turns a single portrait into a relightable 3D avatar with render-ready geometry and physically meaningful intrinsic materials.

arXiv Code Dataset HF Weights HF Demo

Hong Li*, Minqi Meng*, Yanjun Liang, et al. 13 authors · 9 institutions

Hong Li^*,1,2 Minqi Meng^*,1 Yanjun Liang¹ Chongjie Ye³ Houyuan Chen⁴ Weiqing Xiao⁵ Xianda Guo^†,6 Guojun Lei⁷ Xuhui Liu⁸ Chaojie Yang¹ Yanlun Peng^1,2 Hao Zhao⁹ Baochang Zhang^‡,1

¹ BUAA ² GWM ³ CUHKSZ ⁴ HKUST ⁵ NJU ⁶ WHU ⁷ ZJU ⁸ KAUST ⁹ AIR, THU

* Equal contribution. † Project leader. ‡ Corresponding author.

Input: Single image
Output: Interactive GLB avatar
Views: Full material and gray geometry

Aligned input portrait — Aligned portrait, recovered geometry, and relit material render from the same identity.

Recovered geometry — Aligned portrait, recovered geometry, and relit material render from the same identity.

Data

Data-efficient PBR learning

High-quality relightable avatars are learned from fewer than 100 real 3D scans with a Blender-augmented synthetic data pipeline.

Priors

Cascaded diffusion priors

A unified diffusion backbone uses task-specific LoRA adapters for UV texture inpainting, delighting, and intrinsic PBR material estimation.

Shading

Physically coupled materials

Cross-Intrinsic Attention and UV-space differentiable shading align albedo, normals, roughness, specular, and displacement into coherent 4K assets.

Overview

From one portrait to relightable PBR avatar assets

The teaser summarizes the full output space: aligned input, recovered geometry, intrinsic material maps, and relit renderings from a single in-the-wild image.

Generated 3D Assets

Inspect generated avatars in full material and geometry views

Browse converted GLB outputs reconstructed from single portraits. The full material view shows the relightable avatar appearance, while the gray geometry view reveals fine facial structure.

3D viewer or model failed to load. Please check WebGL support and GLB access.

Selected input portrait — Input image for Model 03

Pipeline

Cascaded priors in UV space

The paper pipeline combines geometry alignment, UV-space texture completion, light homogenization, intrinsic material generation, and UV-space differentiable shading.

Additional method figures Data generation, geometry estimation, and training pipeline

Synthetic data generation pipeline — Synthetic data generation for UV-space supervision.

Geometry estimation model — Geometry estimation with local image features and semantic priors.

Cascaded diffusion training pipeline — Cascaded diffusion training for inpainting, delighting, and PBR material estimation.

Evidence

What the priors improve

Complete material estimation enriches geometry through normal and displacement maps. UV diffusion completes occluded textures and removes baked-in illumination while preserving high-frequency identity cues.

Geometry detail comparison against HiFace and 3DDFA-V3.

Texture completion and light homogenization comparison against UV-IDM, HRN, and FFHQ-UV.

Citation

Cite MARCUS-Avatar

@inproceedings{marcusavatar2026,
  title     = {Monocular Avatar Reconstruction via Cascaded Diffusion Priors and UV-Space Differentiable Shading},
  author    = {Anonymous Authors},
  booktitle = {European Conference on Computer Vision},
  year      = {2026}
}