Integrating Multimodal Learning for Scalable Indoor Scene Understanding

Naina Kalyanshetti, Praveen Kulkarni

发表年份: 2025
引用次数: 1

摘要

Indoor scene understanding is a key area in modern computer vision with applications in robotics, virtual reality, and smart environments. It involves identifying objects within a room, understanding their spatial relationships, and interpreting the overall layout. Earlier approaches relied on hand-crafted features like HOG and SIFT, which worked well on simple images but failed in complex or real-world scenes due to occlusion and variability. With the rise of deep learning, especially CNNs and transformer-based models, the accuracy of object detection and scene classification has greatly improved. However, these models require large labeled datasets and high computational resources. To address these challenges, this paper proposes a hybrid framework that combines traditional feature-based techniques with deep learning methods. Analyze various approaches and highlight how multimodal learning can improve accuracy, efficiency, and scalability in indoor scene analysis.

关键词

Computer scienceScalabilityHuman–computer interactionMultimediaArtificial intelligenceDatabase

Integrating Multimodal Learning for Scalable Indoor Scene Understanding

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory