A BoW-equivalent Recurrent Neural Network for Action Recognition

Alexander Richard and Juergen Gall

Abstract

Bag-of-words (BoW) models are widely used in the field of computer vision. A BoW model consists of a visual vocabulary that is generated by unsupervised clustering the features of the training data, e.g., by using kMeans. The clustering methods, however, struggle with large amounts of data, in particular, in the context of action recognition. In this paper, we propose a transformation of the standard BoW model into a neural network, enabling discriminative training of the visual vocabulary on large action recognition datasets. We show that our model is equivalent to the original BoW model but allows for the application of supervised neural network training. Our model outperforms the conventional BoW model and sparse coding methods on recent action recognition benchmarks.

Session

Poster 1

Files

PDF iconExtended Abstract (PDF, 128K)
PDF iconPaper (PDF, 245K)

DOI

10.5244/C.29.57
https://dx.doi.org/10.5244/C.29.57

Citation

Alexander Richard and Juergen Gall. A BoW-equivalent Recurrent Neural Network for Action Recognition. In Xianghua Xie, Mark W. Jones, and Gary K. L. Tam, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 57.1-57.13. BMVA Press, September 2015.

Bibtex

@inproceedings{BMVC2015_57,
	title={A BoW-equivalent Recurrent Neural Network for Action Recognition},
	author={Alexander Richard and Juergen Gall},
	year={2015},
	month={September},
	pages={57.1-57.13},
	articleno={57},
	numpages={13},
	booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
	publisher={BMVA Press},
	editor={Xianghua Xie, Mark W. Jones, and Gary K. L. Tam},
	doi={10.5244/C.29.57},
	isbn={1-901725-53-7},
	url={https://dx.doi.org/10.5244/C.29.57}
}