Refining Skewed Perceptions in Vision-Language Models through Visual Representations

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Large vision-language models (VLMs), such as CLIP, have become foundational, demonstrating remarkable success across a variety of downstream tasks. Despite their advantages, these models, akin to other foundational systems, inherit biases from the disproportionate distribution of real-world data, leading to misconceptions about the actual environment. Prevalent datasets like ImageNet are often riddled with non-causal, spurious correlations that can diminish VLM performance in scenarios where these contextual elements are absent. This study presents an investigation into how a simple linear probe can effectively distill task-specific core features from CLIP's embedding for downstream applications. Our analysis reveals that the CLIP text representations are often tainted by spurious correlations, inherited in the biased pre-training dataset. Empirical evidence suggests that relying on visual representations from CLIP, as opposed to text embedding, is more practical to refine the skewed perceptions in VLMs, emphasizing the superior utility of visual representations in overcoming embedded biases. Our codes will be available here.

Related collections

Author and article information

Journal

Publication date Created: 22 May 2024

Article

ArXiV ID: 2405.14030

SO-VID: 6926c776-b475-49e1-bdd1-329e28cf2e6e

License:

http://creativecommons.org/licenses/by/4.0/

History

Custom metadata

Comments 18 pages, 7 figures

Categories cs.CV cs.CL

ScienceOpen disciplines: Computer vision & Pattern recognition,Theoretical computer science

Data availability:

ScienceOpen disciplines: Computer vision & Pattern recognition, Theoretical computer science

Refining Skewed Perceptions in Vision-Language Models through Visual Representations

Read this article at

Abstract

Related collections

Blockchain in Healthcare Today

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 78