通过基于自我注意的假跨发现部分伪造的音频检测

论文标题

通过基于自我注意的假跨发现部分伪造的音频检测

Partially Fake Audio Detection by Self-attention-based Fake Span Discovery

论文作者

Wu, Haibin, Kuo, Heng-Cheng, Zheng, Naijun, Hung, Kuo-Hsuan, Lee, Hung-Yi, Tsao, Yu, Wang, Hsin-Min, Meng, Helen

论文摘要

过去几年见证了语音综合和语音转换技术的重大进展。但是，这样的技术可能会破坏广泛实现的生物识别模型的鲁棒性，并可以通过野外攻击者来利用非法用途。 ASVSPOOF挑战主要集中在高级语音综合和语音转换模型以及重播攻击的综合音频上。最近，第一个音频深度合成检测挑战（ADD 2022）将攻击方案扩展到更多方面。另外，添加2022是提出部分假音频检测任务的第一个挑战。这种全新的攻击是危险的，如何应对此类攻击仍然是一个悬而未决的问题。因此，我们通过引入提问（假跨度发现）策略的自我注意机制来提出一个新颖的框架，以检测部分伪造的音频。提议的假跨度检测模块任务是反欺骗模型，以预测部分假音频中假剪辑的起点和终端位置，将模型的注意力介绍到发现假跨度的，而不是以较少的概括来发现伪造的跨度，并最终使该模型与真实和部分假audios之间的歧视能力相处。我们的提交在ADD 2022的部分假音频检测轨道中排名第二。

The past few years have witnessed the significant advances of speech synthesis and voice conversion technologies. However, such technologies can undermine the robustness of broadly implemented biometric identification models and can be harnessed by in-the-wild attackers for illegal uses. The ASVspoof challenge mainly focuses on synthesized audios by advanced speech synthesis and voice conversion models, and replay attacks. Recently, the first Audio Deep Synthesis Detection challenge (ADD 2022) extends the attack scenarios into more aspects. Also ADD 2022 is the first challenge to propose the partially fake audio detection task. Such brand new attacks are dangerous and how to tackle such attacks remains an open question. Thus, we propose a novel framework by introducing the question-answering (fake span discovery) strategy with the self-attention mechanism to detect partially fake audios. The proposed fake span detection module tasks the anti-spoofing model to predict the start and end positions of the fake clip within the partially fake audio, address the model's attention into discovering the fake spans rather than other shortcuts with less generalization, and finally equips the model with the discrimination capacity between real and partially fake audios. Our submission ranked second in the partially fake audio detection track of ADD 2022.

下载PDF全文

下载文献需遵守相关版权规定

论文标题