Gradient Ascent Post-training Enhances Language Model Generalization

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

In this work, we empirically show that updating pretrained LMs (350M, 1.3B, 2.7B) with just a few steps of Gradient Ascent Post-training (GAP) on random, unlabeled text corpora enhances its zero-shot generalization capabilities across diverse NLP tasks. Specifically, we show that GAP can allow LMs to become comparable to 2-3x times larger LMs across 12 different NLP tasks. We also show that applying GAP on out-of-distribution corpora leads to the most reliable performance improvements. Our findings indicate that GAP can be a promising method for improving the generalization capability of LMs without any task-specific fine-tuning.

Related collections

Author and article information

Journal

Publication date Created: 12 June 2023

Article

ArXiV ID: 2306.07052

SO-VID: 6ce486b5-6ef9-4041-8665-f5fb50c54485

License:

http://creativecommons.org/licenses/by/4.0/

History

Custom metadata

Comments ACL 2023 Main Conference (Short Paper)

Categories cs.CL cs.AI

ScienceOpen disciplines: Theoretical computer science,Artificial intelligence

Data availability:

ScienceOpen disciplines: Theoretical computer science, Artificial intelligence

Gradient Ascent Post-training Enhances Language Model Generalization

Read this article at

Abstract

Related collections

Radiology and Natural Language Processing

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 211