SoGAR: Self-Supervised Spatiotemporal Attention-Based Social Group Activity Recognition
Naga Venkata Sai Raviteja Chappa, Pha Nguyen, Alexander Nelson, Han‐Seok Seo, Xin Li, Page D. Dobbs, Khoa Luu
- Year
- 2025
- Citations
- 5
Abstract
Social group activity recognition is crucial for various applications including surveillance, human-robot interaction, and behavioral analysis. Current approaches often require extensive manual annotations and rely heavily on pre-trained detectors, limiting their practical applications. Additionally, existing methods struggle to effectively model long-term spatiotemporal relationships in group activities. This paper introduces a novel approach to Social Group Activity Recognition (SoGAR) using Self-supervised Transformers network that can effectively utilize unlabeled video data. To extract spatio-temporal information, we create local and global views with varying frame rates. Our self-supervised objective ensures that features extracted from contrasting views of the same video are consistent across spatio-temporal domains. Our proposed approach efficiently uses transformer-based encoders to alleviate the weakly supervised setting of group activity recognition. By leveraging the benefits of transformer models, our approach can model long-term relationships along spatio-temporal dimensions. Our proposed SoGAR method achieves state-of-the-art results on three group activity recognition benchmarks, namely JRDB-PAR, NBA, and Volleyball datasets, surpassing the current state-of-the-art in terms of F1-score, MCA, and MPCA metrics.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002