Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

This article surveys and organizes research works in a new paradigm in natural language processing, which we dub “prompt-based learning.” Unlike traditional supervised learning, which trains a model to take in an input x and predict an output y as P ( y|x ), prompt-based learning is based on language models that model the probability of text directly. To use these models to perform prediction tasks, the original input x is modified using a template into a textual string prompt x′ that has some unfilled slots, and then the language model is used to probabilistically fill the unfilled information to obtain a final string x̂ , from which the final output y can be derived. This framework is powerful and attractive for a number of reasons: It allows the language model to be pre-trained on massive amounts of raw text, and by defining a new prompting function the model is able to perform few-shot or even zero-shot learning, adapting to new scenarios with few or no labeled data. In this article, we introduce the basics of this promising paradigm, describe a unified set of mathematical notations that can cover a wide variety of existing work, and organize existing work along several dimensions, e.g., the choice of pre-trained language models, prompts, and tuning strategies. To make the field more accessible to interested beginners, we not only make a systematic review of existing works and a highly structured typology of prompt-based concepts but also release other resources, e.g., a website NLPedia–Pretrain including constantly updated survey and paperlist.

Related collections

Most cited references 137

Record: found
Abstract: found
Article: not found

Long Short-Term Memory

Jürgen Schmidhuber, Jürgen Schmidhuber (2003)

Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient-based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O(1). Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.

0 comments Cited 7686 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Conference Proceedings: not found

Glove: Global Vectors for Word Representation

Jeffrey Pennington, Richard Socher, Christopher Manning (2014)

0 comments Cited 1200 times – based on 0 reviews

Bookmark

Record: found
Abstract: found
Article: not found

Representation learning: a review and new perspectives.

Y Bengio, A. Courville, P. Vincent (2013)

The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.

0 comments Cited 1052 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Pengfei Liu: (View ORCID Profile)

Weizhe Yuan: (View ORCID Profile)

Jinlan Fu: (View ORCID Profile)

Zhengbao Jiang: (View ORCID Profile)

Hiroaki Hayashi: (View ORCID Profile)

Graham Neubig: (View ORCID Profile)

Journal

Title: ACM Computing Surveys

Abbreviated Title: ACM Comput. Surv.

Publisher: Association for Computing Machinery (ACM)

ISSN (Print): 0360-0300

ISSN (Electronic): 1557-7341

Publication date Created: September 30 2023

Publication date (Electronic): January 16 2023

Publication date (Print): September 30 2023

Volume: 55

Issue: 9

Pages: 1-35

Affiliations

[1 ]Carnegie Mellon University, Pittsburgh, Pennsylvania, USA

[2 ]National University of Singapore, Singapore

Article

DOI: 10.1145/3560815

SO-VID: 7d29534a-c943-4738-9a4c-ab8bca11c1de

History

Data availability:

Comments

Comment on this article

scite_

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.