Controlling Human Shape and Pose in Text-to-Image Diffusion Models via Domain Adaptation

Abstract

We present a methodology for conditional control of human shape and pose in pretrained text-to-image diffusion models using a 3D human parametric model (SMPL). Fine-tuning these diffusion models to adhere to new conditions requires large datasets and high-quality annotations, which can be more cost-effectively acquired through synthetic data generation rather than real-world data. However, the domain gap and low scene diversity of synthetic data can compromise the pretrained model's visual fidelity. We propose a domain-adaptation technique that maintains image quality by isolating synthetically trained conditional information in the classifier-free guidance vector and composing it with another control network to adapt the generated images to the input domain. To achieve SMPL-control, we fine-tune a ControlNet-based architecture on the synthetic SURREAL dataset of rendered humans and apply our domain adaptation at generation time. Experiments demonstrate that our model achieves greater shape and pose diversity than the 2D pose-based ControlNet, while maintaining the visual fidelity and improving stability, proving its usefulness for downstream tasks such as human animation.

Examples

shape control — Sampling different shape parameters of the SMPL model.
Changes in body proportions are reflected in the generated images.

BibTeX

@inproceedings{buchheim2025controlling,
  author    = {Buchheim, Benito and Reimann, Max and D{\"o}llner, J{\"u}rgen},
  title     = {Controlling Human Shape and Pose in Text-to-Image Diffusion Models via Domain Adaptation},
  booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
  month     = {February},
  year      = {2025},
  pages     = {3688-3697}
}

Acknowledgements

Our work "Controlling Human Shape and Pose in Text-to-Image Diffusion Models via Domain Adaptation" was partially funded by the German Federal Ministry of Education and Research (BMBF) through grants 01IS15041 – “mdViPro” and 01IS19006 – “KI-Labor ITSE”.

Controlling Human Shape and Pose
in Text-to-Image Diffusion Models
via Domain Adaptation

Our approach allows 3d parametric control over human pose and shape in LDMs using SMPL meshes.

Abstract

Approach

Examples

Sampling different shape parameters of the SMPL model.
Changes in body proportions are reflected in the generated images.

BibTeX

Acknowledgements

Controlling Human Shape and Pose in Text-to-Image Diffusion Models via Domain Adaptation

Our approach allows 3d parametric control over human pose and shape in LDMs using SMPL meshes.

Abstract

Approach

Examples

Sampling different shape parameters of the SMPL model. Changes in body proportions are reflected in the generated images.

BibTeX

Acknowledgements

Controlling Human Shape and Pose
in Text-to-Image Diffusion Models
via Domain Adaptation

Sampling different shape parameters of the SMPL model.
Changes in body proportions are reflected in the generated images.