Home /Research /Multimodal scene recognition using semantic segmentation and deep learning integration

LEARNING

Multimodal scene recognition using semantic segmentation and deep learning integration

Aysha Naseer, Mohammed Alnusayri, Haifa F. Alhasson, Mohammed Alatiyyah, Dina Abdulaziz AlHammadi, Ahmad Jalal, Jeongmin Park

Year: 2025
Citations: 15
Access: Open access

Abstract

Semantic modeling and recognition of indoor scenes present a significant challenge due to the complex composition of generic scenes, which contain a variety of features including themes and objects, makes semantic modeling and indoor scene recognition difficult. The gap between high-level scene interpretation and low-level visual features increases the complexity of scene recognition. In order to overcome these obstacles, this study presents a novel multimodal deep learning technique that enhances scene recognition accuracy and robustness by combining depth information with conventional red-green-blue (RGB) image data. Convolutional neural networks (CNNs) and spatial pyramid pooling (SPP) are used for analysis after a depth-aware segmentation methodology is used to identify several objects in an image. This allows for more precise image classification. The effectiveness of this method is demonstrated by experimental findings, which show 91.73% accuracy on the RGB-D scene dataset and 90.53% accuracy on the NYU Depth v2 dataset. These results demonstrate how the multimodal approach can improve scene detection and classification, with potential uses in fields including robotics, sports analysis, and security systems.

Keywords

Artificial intelligenceComputer scienceConvolutional neural networkRGB color modelRobustness (evolution)Deep learningSegmentationPoolingComputer visionPattern recognition (psychology)

Multimodal scene recognition using semantic segmentation and deep learning integration

Abstract

Keywords

Related papers

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory