A BoW-equivalent Recurrent Neural Network for Action Recognition
Alexander Richard and Juergen Gall
Abstract
Bag-of-words (BoW) models are widely used in the field of computer vision. A BoW model consists of a visual vocabulary that is generated by unsupervised clustering the features of the training data, e.g., by using kMeans. The clustering methods, however, struggle with large amounts of data, in particular, in the context of action recognition. In this paper, we propose a transformation of the standard BoW model into a neural network, enabling discriminative training of the visual vocabulary on large action recognition datasets. We show that our model is equivalent to the original BoW model but allows for the application of supervised neural network training. Our model outperforms the conventional BoW model and sparse coding methods on recent action recognition benchmarks.
Session
Poster 1
Files
Extended Abstract (PDF, 128K)
Paper (PDF, 245K)
DOI
10.5244/C.29.57
https://dx.doi.org/10.5244/C.29.57
Citation
Alexander Richard and Juergen Gall. A BoW-equivalent Recurrent Neural Network for Action Recognition. In Xianghua Xie, Mark W. Jones, and Gary K. L. Tam, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 57.1-57.13. BMVA Press, September 2015.
Bibtex
@inproceedings{BMVC2015_57,
title={A BoW-equivalent Recurrent Neural Network for Action Recognition},
author={Alexander Richard and Juergen Gall},
year={2015},
month={September},
pages={57.1-57.13},
articleno={57},
numpages={13},
booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
publisher={BMVA Press},
editor={Xianghua Xie, Mark W. Jones, and Gary K. L. Tam},
doi={10.5244/C.29.57},
isbn={1-901725-53-7},
url={https://dx.doi.org/10.5244/C.29.57}
}