A Toolchain for Comprehensive Audio/Video Analysis Using Deep Learning Based Multimodal Approach (A use case of riot or violent context detection)

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

In this paper, we present a toolchain for a comprehensive audio/video analysis by leveraging deep learning based multimodal approach. To this end, different specific tasks of Speech to Text (S2T), Acoustic Scene Classification (ASC), Acoustic Event Detection (AED), Visual Object Detection (VOD), Image Captioning (IC), and Video Captioning (VC) are conducted and integrated into the toolchain. By combining individual tasks and analyzing both audio \& visual data extracted from input video, the toolchain offers various audio/video-based applications: Two general applications of audio/video clustering, comprehensive audio/video summary and a specific application of riot or violent context detection. Furthermore, the toolchain presents a flexible and adaptable architecture that is effective to integrate new models for further audio/video-based applications.

Related collections

Author and article information

Journal

Publication date Created: 02 May 2024

Article

ArXiV ID: 2407.03110

SO-VID: 990febb3-e238-4dd0-9ddd-07507635c986

License:

http://creativecommons.org/licenses/by-nc-sa/4.0/

History

Custom metadata

Categories cs.SD cs.AI eess.AS

ScienceOpen disciplines: Artificial intelligence,Electrical engineering,Graphics & Multimedia design

Data availability:

ScienceOpen disciplines: Artificial intelligence, Electrical engineering, Graphics & Multimedia design

A Toolchain for Comprehensive Audio/Video Analysis Using Deep Learning Based Multimodal Approach (A use case of riot or violent context detection)

Read this article at

Abstract

Related collections

Recursive Rule based Visual Categorization

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 188