Deep neural networks achieve significant progress in human pose estimation (HPE), but there are still many difficulties in practical applications. One of the important reasons is that the current advanced models are very complex leading to huge computation cost. We propose a flow-based attention lightweight network (FALNet) for HPE to solve this problem. We first design a cheap lightweight bottleneck to reduce the model size and complexity with two components: depthwise convolution and cheap unit. Then we propose a flow-based fusion attention block to generate and aggregate multi-scale features of the same layer effectively. We demonstrate the effectiveness of our methods on two benchmark datasets: the COCO dataset and the MPII dataset. Our network FALNet-50 only has 2.2 M parameters and 0.66G floating-point operations (FLOPs), achieving comparable or even better accuracy with smaller complexity. Moreover, we show the speed advantage of our network during inference. Specifically, our FALNet-50 achieves 70.4 in average precision score on COCO val2017 with 16 FPS inference speed on a smartphone. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Convolution
Pose estimation
Feature fusion
Lithium
Mobile devices
Performance modeling
Autoregressive models