首页 /研究 /Analyzing Liquid Pouring Sequences via Audio-Visual Neural Networks
LEARNING

Analyzing Liquid Pouring Sequences via Audio-Visual Neural Networks

Justin Wilson, Auston Sterling, Ming C. Lin

发表年份
2019
引用次数
17

摘要

Existing work to estimate the weight of a liquid poured into a target container often require predefined source weights or visual data. We present novel audio-based and audio-augmented techniques, in the form of multimodal convolutional neural networks (CNNs), to estimate poured weight, perform overflow detection, and classify liquid and target container. Our audio-based neural network uses the sound from a pouring sequence-a liquid being poured into a target container. Audio inputs consist of converting raw audio into mel-scaled spectrograms. Our audio-augmented network fuses this audio with its corresponding visual data based on video images. Only a microphone and camera are required, which can be found in any modern smartphone or Microsoft Kinect. Our approach improves classification accuracy for different environments, containers, and contents of the robot pouring task. Our Pouring Sequence Neural Networks (PSNN) are trained and tested using the Rethink Robotics Baxter Research Robot. To the best of our knowledge, this is the first use of audio-visual neural networks to analyze liquid pouring sequences by classifying their weight, liquid, and receiving container.

关键词

Computer scienceContainer (type theory)Artificial neural networkConvolutional neural networkSpectrogramArtificial intelligenceMicrophoneVisualizationAudio visualRobot

相关论文

查看 LEARNING 分类全部论文