SPIdepth: Strengthened Pose Information for Self-Supervised Monocular Depth Estimation
Mykola Lavreniuk, Alla Lavreniuk
- 发表年份
- 2025
- 引用次数
- 5
摘要
Self-supervised monocular depth estimation has garnered significant attention for its applications in autonomous driving and robotics. While recent methods have focused on improving depth networks, they often overlook the role of pose estimation, treating it as a secondary component. In this paper, we introduce SPIdepth, a novel approach that enhances pose network design to improve depth estimation without increasing model complexity or inference cost. Building upon SQLdepth, SPIdepth replaces the smaller, randomly initialized PoseNet with a larger, pretrained PoseNet, leveraging representations learned from large-scale datasets. This stabilizes motion estimation during training and leads to improvements in depth prediction, even without increasing inference-time cost. Moreover, SPIdepth first pretrains the PoseNet for accurate image warping before jointly optimizing it with the depth network. Extensive experiments on KITTI, Cityscapes, and Make3D demonstrate that SPIdepth surpasses prior methods by significant margins. On KITTI, SPIdepth achieves the lowest AbsRel (0.029), SqRel (0.069), and RMSE (1.394), establishing a new state-of-the-art. On Cityscapes, SPIdepth improves upon SQLdepth by 21.7% in AbsRel, 36.8% in SqRel, and 16.5% in RMSE, even without motion masks. Moreover, SPIdepth outperforms all models in zero-shot evaluation on Make3D. Beyond traditional benchmarks, SPIdepth ranks first in the NTIRE 2025 HR Mono Depth Challenge, achieving 97.6% Delta 1.05 validation accuracy on transparent and mirror surfaces. This underscores its robustness in handling challenging non-Lambertian surfaces and its effectiveness in real-world depth estimation. Remarkably, SPIdepth uses only a single image for inference and still outperforms video-based methods, highlighting its practical efficiency and scalability for real-world deployment. Our findings highlight the importance of strengthened pose information in advancing self-supervised depth estimation. The code and pre-trained models are available at https://github.com/Lavreniuk/SPIdepth.
关键词
相关论文
Artificial intelligence: a modern approach
1995
Are we ready for autonomous driving? The KITTI vision benchmark suite
Andreas Geiger, P Lenz, R. Urtasun
2012
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
Martı́n Abadi, Ashish Agarwal, Paul Barham 等 20 位作者
2016
Vision meets robotics: The KITTI dataset
Andreas Geiger, Philip Lenz, Christoph Stiller 等 4 位作者
2013