Contextual Chart Generation for Cyber Deception

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Honeyfiles are security assets designed to attract and detect intruders on compromised systems. Honeyfiles are a type of honeypot that mimic real, sensitive documents, creating the illusion of the presence of valuable data. Interaction with a honeyfile reveals the presence of an intruder, and can provide insights into their goals and intentions. Their practical use, however, is limited by the time, cost and effort associated with manually creating realistic content. The introduction of large language models has made high-quality text generation accessible, but honeyfiles contain a variety of content including charts, tables and images. This content needs to be plausible and realistic, as well as semantically consistent both within honeyfiles and with the real documents they mimic, to successfully deceive an intruder. In this paper, we focus on an important component of the honeyfile content generation problem: document charts. Charts are ubiquitous in corporate documents and are commonly used to communicate quantitative and scientific data. Existing image generation models, such as DALL-E, are rather prone to generating charts with incomprehensible text and unconvincing data. We take a multi-modal approach to this problem by combining two purpose-built generative models: a multitask Transformer and a specialized multi-head autoencoder. The Transformer generates realistic captions and plot text, while the autoencoder generates the underlying tabular data for the plot. To advance the field of automated honeyplot generation, we also release a new document-chart dataset and propose a novel metric Keyword Semantic Matching (KSM). This metric measures the semantic consistency between keywords of a corpus and a smaller bag of words. Extensive experiments demonstrate excellent performance against multiple large language models, including ChatGPT and GPT4.

Related collections

Author and article information

Journal

Publication date Created: 07 April 2024

Article

ArXiV ID: 2404.04854

SO-VID: 83bbd6df-03c7-44b9-8522-1f27ca89f8b3

License:

http://creativecommons.org/licenses/by/4.0/

History

Custom metadata

Comments 13 pages including references

Categories cs.LG cs.AI cs.CR

ScienceOpen disciplines: Security & Cryptology,Artificial intelligence

Data availability:

ScienceOpen disciplines: Security & Cryptology, Artificial intelligence

Contextual Chart Generation for Cyber Deception

Read this article at

Abstract

Related collections

Global Health Next Generation Network

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 23