CBW: Towards Dataset Ownership Verification for Speaker Verification via Clustering-based Backdoor Watermarking

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

With the increasing adoption of deep learning in speaker verification, large-scale speech datasets have become valuable intellectual property. To audit and prevent the unauthorized usage of these valuable released datasets, especially in commercial or open-source scenarios, we propose a novel dataset ownership verification method. Our approach introduces a clustering-based backdoor watermark (CBW), enabling dataset owners to determine whether a suspicious third-party model has been trained on a protected dataset under a black-box setting. The CBW method consists of two key stages: dataset watermarking and ownership verification. During watermarking, we implant multiple trigger patterns in the dataset to make similar samples (measured by their feature similarities) close to the same trigger while dissimilar samples are near different triggers. This ensures that any model trained on the watermarked dataset exhibits specific misclassification behaviors when exposed to trigger-embedded inputs. To verify dataset ownership, we design a hypothesis-test-based framework that statistically evaluates whether a suspicious model exhibits the expected backdoor behavior. We conduct extensive experiments on benchmark datasets, verifying the effectiveness and robustness of our method against potential adaptive attacks. The code for reproducing main experiments is available at https://github.com/Radiant0726/CBW

Related collections

Author and article information

Journal

Publication date Created: 01 March 2025

Article

ArXiV ID: 2503.05794

SO-VID: 64d523ff-19e6-46ad-a4c7-fbac9732b926

License:

http://creativecommons.org/licenses/by/4.0/

History

Custom metadata

Comments 14 pages. The journal extension of our ICASSP'21 paper (arXiv:2010.11607)

Categories cs.CR cs.AI cs.LG cs.SD eess.AS

ScienceOpen disciplines: Security & Cryptology,Artificial intelligence,Electrical engineering,Graphics & Multimedia design

Data availability:

ScienceOpen disciplines: Security & Cryptology, Artificial intelligence, Electrical engineering, Graphics & Multimedia design

CBW: Towards Dataset Ownership Verification for Speaker Verification via Clustering-based Backdoor Watermarking

Read this article at

Abstract

Related collections

Electronic Workshops in Computing (eWiC)

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 658