FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Diffusion-based audio-driven talking avatar methods have recently gained attention for their high-fidelity, vivid, and expressive results. However, their slow inference speed limits practical applications. Despite the development of various distillation techniques for diffusion models, we found that naive diffusion distillation methods do not yield satisfactory results. Distilled models exhibit reduced robustness with open-set input images and a decreased correlation between audio and video compared to teacher models, undermining the advantages of diffusion models. To address this, we propose FADA (Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation). We first designed a mixed-supervised loss to leverage data of varying quality and enhance the overall model capability as well as robustness. Additionally, we propose a multi-CFG distillation with learnable tokens to utilize the correlation between audio and reference image conditions, reducing the threefold inference runs caused by multi-CFG with acceptable quality degradation. Extensive experiments across multiple datasets show that FADA generates vivid videos comparable to recent diffusion model-based methods while achieving an NFE speedup of 4.17-12.5 times. Demos are available at our webpage http://fadavatar.github.io.

Related collections

Author and article information

Journal

Publication date Created: 22 December 2024

Article

ArXiV ID: 2412.16915

SO-VID: 0df59661-1f0d-4557-b000-9fc4e46d1846

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Categories cs.CV cs.AI cs.GR cs.SD eess.AS

ScienceOpen disciplines: Computer vision & Pattern recognition,Artificial intelligence,Electrical engineering,Graphics & Multimedia design

Data availability:

ScienceOpen disciplines: Computer vision & Pattern recognition, Artificial intelligence, Electrical engineering, Graphics & Multimedia design

FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation

Read this article at

Abstract

Related collections

IET Power Engineering

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 458