With the rapid development of deep learning, neural network models have become increasingly complicated, leading to larger storage space requirements and slower reasoning speed. These factors make it difficult to be deployed on resourcelimited platforms. To alleviate this problem, network pruning, an effective model compression method, is commonly performed in a deep neural network. However, traditional pruning methods simply set redundant weights to zero, thus failing to achieve the acceleration effect. In this paper, a channel-wise model scaling method is proposed to reduce the model size and speed up reasoning by structurally removing the redundant filters in convolutional layers. To make the residual block more sparse, we develop a pruning method for residual cells. Experimental results on the YOLOv3 detector show that our proposed approach achieves a 70.6% parameter compression ratio without compromising accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.