Estimating human body pose and shape from a single-view image has been highly successful, but most existing methods require a model with a large number of parameters that are difficult to run on low performance devices. Light weight networks are struggle to extract sufficient information for human pose and shape estimation, making accurate prediction challenging. In this paper, we propose a lightweight model for predicting human body shape and pose parameters of a parametric human body model. Our method comprises a lightweight multi-stage encoder based on Litehrnet and Shufflenet, and a decoder composed of cascaded MLPs based on human kinematic tree, which achieves comparable performance to HMR while the model size is only one-ninth of HMR. In addition, our model can achieve an inference speed of 19.2 times per second on the Qualcomm Snapdragon 888+
|