MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

LLM-based programming assistants offer the promise of programming faster but with the risk of introducing more security vulnerabilities. Prior work has studied how LLMs could be maliciously fine-tuned to suggest vulnerabilities more often. With the rise of agentic LLMs, which may use results from an untrusted third party, there is a growing risk of attacks on the model's prompt. We introduce the Malicious Programming Prompt (MaPP) attack, in which an attacker adds a small amount of text to a prompt for a programming task (under 500 bytes). We show that our prompt strategy can cause an LLM to add vulnerabilities while continuing to write otherwise correct code. We evaluate three prompts on seven common LLMs, from basic to state-of-the-art commercial models. Using the HumanEval benchmark, we find that our prompts are broadly effective, with no customization required for different LLMs. Furthermore, the LLMs that are best at HumanEval are also best at following our malicious instructions, suggesting that simply scaling language models will not prevent MaPP attacks. Using a dataset of eight CWEs in 16 scenarios, we find that MaPP attacks are also effective at implementing specific and targeted vulnerabilities across a range of models. Our work highlights the need to secure LLM prompts against manipulation as well as rigorously auditing code generated with the help of LLMs.

Related collections

Author and article information

Journal

Publication date Created: 12 July 2024

Article

ArXiV ID: 2407.11072

SO-VID: 3b2c2df3-a633-4579-bf0d-7353b81a80f2

License:

http://creativecommons.org/licenses/by/4.0/

History

Custom metadata

Comments 6 pages, 5 figures, Proceedings of the ICML 2024 Workshop on Trustworthy Multimodal Foundation Models and AI Agents

Categories cs.CR cs.AI

ScienceOpen disciplines: Security & Cryptology,Artificial intelligence

Data availability:

ScienceOpen disciplines: Security & Cryptology, Artificial intelligence

MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants

Read this article at

Abstract

Related collections

Smart Contracts Programming Languages

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 340