From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Deep generative models can generate high-fidelity audio conditioned on various types of representations (e.g., mel-spectrograms, Mel-frequency Cepstral Coefficients (MFCC)). Recently, such models have been used to synthesize audio waveforms conditioned on highly compressed representations. Although such methods produce impressive results, they are prone to generate audible artifacts when the conditioning is flawed or imperfect. An alternative modeling approach is to use diffusion models. However, these have mainly been used as speech vocoders (i.e., conditioned on mel-spectrograms) or generating relatively low sampling rate signals. In this work, we propose a high-fidelity multi-band diffusion-based framework that generates any type of audio modality (e.g., speech, music, environmental sounds) from low-bitrate discrete representations. At equal bit rate, the proposed approach outperforms state-of-the-art generative techniques in terms of perceptual quality. Training and, evaluation code, along with audio samples, are available on the facebookresearch/audiocraft Github page.

Related collections

Author and article information

Journal

Publication date Created: 02 August 2023

Article

ArXiV ID: 2308.02560

SO-VID: 9c6d2c2a-6699-44b8-803d-fad5f2765524

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Comments 10 pages

Categories cs.SD cs.LG eess.AS

ScienceOpen disciplines: Artificial intelligence,Electrical engineering,Graphics & Multimedia design

Data availability:

ScienceOpen disciplines: Artificial intelligence, Electrical engineering, Graphics & Multimedia design

From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion

Read this article at

Abstract

Related collections

IET Power Engineering

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 485