Semi-Supervised Action Recognition with Temporal Contrastive Learning
Authors
Authors
- Ankit Singh
- Omprakash Chakraborty
- Ashutosh Varshney
- Rameswar Panda
- Rogerio Feris
- Kate Saenko
- Abir Das
Authors
- Ankit Singh
- Omprakash Chakraborty
- Ashutosh Varshney
- Rameswar Panda
- Rogerio Feris
- Kate Saenko
- Abir Das
Published on
02/04/2021
Categories
Learning to recognize actions from only a handful of labeled videos is a challenging problem due to the scarcity of tediously collected activity labels. We approach this problem by learning a two-pathway temporal contrastive model using unlabeled videos in two different speeds. Specifically, we propose to maximize the similarity between encoded representations of the same video in two different speeds as well as minimize the same between different videos run in different speeds. This way we leverage the rich supervisory information in terms of `time’ that is present in otherwise unsupervised pull of videos. With this simple yet surprisingly effective strategy of manipulating playback rates of unlabeled video, we are able to considerably outperform video extensions of sophisticated state-of-the-art semi-supervised image recognition methodologies across multiple datasets and network architectures. Interestingly, our approach is shown to benefit from out-of-domain unlabeled videos showing robustness and generalizability of it. We also perform rigorous ablations and analysis to validate our approach.
Please cite our work using the BibTeX below.
@InProceedings{Singh_2021_CVPR,
author = {Singh, Ankit and Chakraborty, Omprakash and Varshney, Ashutosh and Panda, Rameswar and Feris, Rogerio and Saenko, Kate and Das, Abir},
title = {Semi-Supervised Action Recognition With Temporal Contrastive Learning},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2021},
pages = {10389-10399}
}