Authors
Yaoyao Li, Li Wang, Huailin Zhao*, Zhen Nie
School of Electrical and Electronic Engineering, Shanghai Institute of
Technology, Shanghai, China
*Corresponding author. Email: [email protected]; www.sit.edu.cn
Corresponding Author
Huailin Zhao
Received 5 September 2019, Accepted 11 May 2020, Available Online 2 June
2020.
DOI
https://doi.org/10.2991/jrnal.k.200528.009
Keywords
Self-attention distillation; dilated convolution; crowd counting
Abstract
Context information is essential for crowd counting network to estimate
crowd numbers, especially in the congested scene accurately. However, shallow
layers of common crowd counting networks (i.e., congested scene recognition
network) do not own large receptive filed so that they can’t efficiently
utilize context information from the crowd scene. To solve this problem,
in this paper, we propose a crowd counting network with self-attention
distillation. Each input image is first sent to the visual geometry group
(VGG)-16 network for feature extracting. Then, the extracted features are
processed by the dilated convolutional part for the final crowd density
estimation. Specially, we apply self-attention distillation strategy at
different locations of the dilated convolutional part to use the global
context information from the deeper layers to guide the shallower layers
to learn. We compare our method with the other state-of-the-art works on
the UCF-QNRF dataset, and the experiment results demonstrate the superiority
of our method.
Copyright
© 2020 The Authors. Published by ALife Robotics Corp. Ltd.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license
(http://creativecommons.org/licenses/by-nc/4.0/).