Home /Research /SoGAR: Self-Supervised Spatiotemporal Attention-Based Social Group Activity Recognition
HRI

SoGAR: Self-Supervised Spatiotemporal Attention-Based Social Group Activity Recognition

Naga Venkata Sai Raviteja Chappa, Pha Nguyen, Alexander Nelson, Han‐Seok Seo, Xin Li, Page D. Dobbs, Khoa Luu

Year
2025
Citations
5

Abstract

Social group activity recognition is crucial for various applications including surveillance, human-robot interaction, and behavioral analysis. Current approaches often require extensive manual annotations and rely heavily on pre-trained detectors, limiting their practical applications. Additionally, existing methods struggle to effectively model long-term spatiotemporal relationships in group activities. This paper introduces a novel approach to Social Group Activity Recognition (SoGAR) using Self-supervised Transformers network that can effectively utilize unlabeled video data. To extract spatio-temporal information, we create local and global views with varying frame rates. Our self-supervised objective ensures that features extracted from contrasting views of the same video are consistent across spatio-temporal domains. Our proposed approach efficiently uses transformer-based encoders to alleviate the weakly supervised setting of group activity recognition. By leveraging the benefits of transformer models, our approach can model long-term relationships along spatio-temporal dimensions. Our proposed SoGAR method achieves state-of-the-art results on three group activity recognition benchmarks, namely JRDB-PAR, NBA, and Volleyball datasets, surpassing the current state-of-the-art in terms of F1-score, MCA, and MPCA metrics.

Keywords

Computer scienceGroup (periodic table)Artificial intelligencePattern recognition (psychology)Activity recognitionMachine learning

Related papers

Browse all HRI papers