Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

State-of-the-art approaches for hate-speech detection usually exhibit poor performance in out-of-domain settings. This occurs, typically, due to classifiers overemphasizing source-specific information that negatively impacts its domain invariance. Prior work has attempted to penalize terms related to hate-speech from manually curated lists using feature attribution methods, which quantify the importance assigned to input terms by the classifier when making a prediction. We, instead, propose a domain adaptation approach that automatically extracts and penalizes source-specific terms using a domain classifier, which learns to differentiate between domains, and feature-attribution scores for hate-speech classes, yielding consistent improvements in cross-domain evaluation.

Related collections

Author and article information

Journal

Publication date Created: 18 September 2022

Article

ArXiV ID: 2209.08681

SO-VID: 631cbde6-fe36-4c17-a04c-358a290e36ad

License:

http://creativecommons.org/licenses/by/4.0/

History

Custom metadata

Comments COLING 2022 pre-print

Categories cs.CL

ScienceOpen disciplines: Theoretical computer science

Data availability:

ScienceOpen disciplines: Theoretical computer science

Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection

Read this article at

Abstract

Related collections

Blockchain in Healthcare Today

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 501