SLM: Bridge the thin gap between speech and text foundation models

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

We present a joint Speech and Language Model (SLM), a multitask, multilingual, and dual-modal model that takes advantage of pretrained foundational speech and language models. SLM freezes the pretrained foundation models to maximally preserves their capabilities, and only trains a simple adapter with just 1\% (156M) of the foundation models' parameters. This adaptation not only leads SLM to achieve strong performance on conventional tasks such as speech recognition (ASR) and speech translation (AST), but also introduces the novel capability of zero-shot instruction-following for more diverse tasks: given a speech input and a text instruction, SLM is able to perform unseen generation tasks including contextual biasing ASR using real-time context, dialog generation, speech continuation, and question answering, etc. Our approach demonstrates that the representational gap between pretrained speech and language models might be narrower than one would expect, and can be bridged by a simple adaptation mechanism. As a result, SLM is not only efficient to train, but also inherits strong capabilities already acquired in foundation models of different modalities.

Related collections

Author and article information

Journal

Publication date Created: 29 September 2023

Article

ArXiV ID: 2310.00230

SO-VID: 8e7b96ab-3516-4d52-8130-efb75b092026

License:

http://creativecommons.org/licenses/by/4.0/

History

Custom metadata

Categories cs.CL cs.SD eess.AS

ScienceOpen disciplines: Theoretical computer science,Electrical engineering,Graphics & Multimedia design

Data availability:

ScienceOpen disciplines: Theoretical computer science, Electrical engineering, Graphics & Multimedia design

SLM: Bridge the thin gap between speech and text foundation models

Read this article at

Abstract

Related collections

Exponential Random Graph Models

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 220