Sketch-a-Net that Beats Humans

Qian Yu, Yongxin Yang, Yi-Zhe Song, Tao Xiang and Timothy Hospedales

Abstract

We propose a multi-scale multi-channel deep neural network framework that, for the ﬁrst time, yields sketch recognition performance surpassing that of humans. Our superior performance is a result of explicitly embedding the unique characteristics of sketches in our model: (i) a network architecture designed for sketch rather than natural photo statistics, (ii) a multi-channel generalisation that encodes sequential ordering in the sketching process, and (iii) a multi-scale network ensemble with joint Bayesian fusion that accounts for the different levels of abstraction exhibited in free-hand sketches. We show that state-of-the art deep networks speciﬁcally engineered for photos of natural objects fail to perform well on sketch recognition, regardless whether they are trained using photo or sketch. Our network on the other hand not only delivers the best performance on the largest human sketch dataset to date, but also is small in size making efﬁcient training possible using just CPUs.

Session

Deep Learning

Files

Extended Abstract (PDF, 415K)

Paper (PDF, 920K)

DOI

10.5244/C.29.7
https://dx.doi.org/10.5244/C.29.7

Citation

Qian Yu, Yongxin Yang, Yi-Zhe Song, Tao Xiang and Timothy Hospedales. Sketch-a-Net that Beats Humans. In Xianghua Xie, Mark W. Jones, and Gary K. L. Tam, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 7.1-7.12. BMVA Press, September 2015.

Bibtex

@inproceedings{BMVC2015_7,
	title={Sketch-a-Net that Beats Humans},
	author={Qian Yu and Yongxin Yang and Yi-Zhe Song and Tao Xiang and Timothy Hospedales},
	year={2015},
	month={September},
	pages={7.1-7.12},
	articleno={7},
	numpages={12},
	booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
	publisher={BMVA Press},
	editor={Xianghua Xie, Mark W. Jones, and Gary K. L. Tam},
	doi={10.5244/C.29.7},
	isbn={1-901725-53-7},
	url={https://dx.doi.org/10.5244/C.29.7}
}