Feature learning for RGB-D data

Ziyun Cai

发表年份: 2017
引用次数: 2
访问权限: 开放获取

摘要

RGB-D data has turned out to be a very useful representation for solving fundamental computer\nvision problems. It takes the advantages of the color images that provide appearance\ninformation of an object and also the depth image that is immune to the variations in color,\nillumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect\nsensor, which was initially used for gaming and later became a popular device for computer\nvision, high quality RGB-D data can be acquired easily. RGB-D image/video can facilitate\na wide range of application areas, such as computer vision, robotics, construction and medical\nimaging. Furthermore, how to fuse RGB information and depth information is still a\nproblem in computer vision. It is not enough to simply concatenate RGB data and depth\ndata together. A new fusion method could better fuse RGB images and depth images. It\nstill needs more powerful algorithms on this. In this thesis, to explore more advantages of\nRGB-D data, we use some popular RGB-D datasets for deep feature learning algorithms\nevaluation, hyper-parameter optimization, local multi-modal feature learning, RGB-D data\nfusion and recognizing RGB information from RGB-D images: i)With the success of Deep\nNeural Network in computer vision, deep features from fused RGB-D data can be proved to\ngain better results than RGB data only. However, different deep learning algorithms show\ndifferent performance on different RGB-D datasets. Through large-scale experiments to\ncomprehensively evaluate the performance of deep feature learning models for RGB-D image/\nvideo classification, we obtain the conclusion that RGB-D fusion methods using CNNs\nalways outperform other selected methods (DBNs, SDAE and LSTM). On the other side, since\nLSTM can learn from experience to classify, process and predict time series, it achieved\nbetter performances than DBN and SDAE in video classification tasks. ii) Hyper-parameter\noptimization can help researchers quickly choose an initial set of hyper-parameters for a new\ncoming classification task, thus reducing the number of trials in terms of hyper-parameter\nspace. We present a simple and efficient framework for improving the efficiency and accuracy\nof hyper-parameter optimization by considering the classification complexity of a\nparticular dataset. We verify this framework on three real-world RGB-D datasets. After\nthe analysis of experiments, we confirm that our framework can provide deeper insights\ninto the relationship between dataset classification tasks and hyperparameters optimization, thus quickly choosing an accurate initial set of hyper-parameters for a new coming classification\ntask. iii) We propose a new Convolutional Neural Networks (CNNs)-based local\nmulti-modal feature learning framework for RGB-D scene classification. This method can\neffectively capture much of the local structure from the RGB-D scene images and automatically\nlearn a fusion strategy for the object-level recognition step instead of simply training a\nclassifier on top of features extracted from both modalities. Experiments are conducted on\ntwo popular datasets to thoroughly test the performance of our method, which show that our\nmethod with local multi-modal CNNs greatly outperforms state-of-the-art approaches. Our\nmethod has the potential to improve RGB-D scene understanding. Some extended evaluation\nshows that CNNs trained using a scene-centric dataset is able to achieve an improvement\non scene benchmarks compared to a network trained using an object-centric dataset.\niv) We propose a novel method for RGB-D data fusion. We project raw RGB-D data into\na complex space and then jointly extract features from the fused RGB-D images. Besides\nthree observations about the fusi

关键词

RGB color modelArtificial intelligenceComputer scienceComputer visionFeature (linguistics)Deep learningFeature extraction

Feature learning for RGB-D data

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory