Integrating Multimodal Learning for Scalable Indoor Scene Understanding
Naina Kalyanshetti, Praveen Kulkarni
- 发表年份
- 2025
- 引用次数
- 1
摘要
Indoor scene understanding is a key area in modern computer vision with applications in robotics, virtual reality, and smart environments. It involves identifying objects within a room, understanding their spatial relationships, and interpreting the overall layout. Earlier approaches relied on hand-crafted features like HOG and SIFT, which worked well on simple images but failed in complex or real-world scenes due to occlusion and variability. With the rise of deep learning, especially CNNs and transformer-based models, the accuracy of object detection and scene classification has greatly improved. However, these models require large labeled datasets and high computational resources. To address these challenges, this paper proposes a hybrid framework that combines traditional feature-based techniques with deep learning methods. Analyze various approaches and highlight how multimodal learning can improve accuracy, efficiency, and scalability in indoor scene analysis.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002