Paper
3 January 2025 Selective sampling and temporal positional encoding for monocular video-based 3D human pose and shape estimation
Yupeng Hou, Han Wen, Guangping Zeng
Author Affiliations +
Proceedings Volume 13442, Fifth International Conference on Signal Processing and Computer Science (SPCS 2024); 134422D (2025) https://doi.org/10.1117/12.3054276
Event: Fifth International Conference on Signal Processing and Computer Science (SPCS 2024), 2024, Kaifeng, China
Abstract
This study introduces a novel framework for video-based 3D human pose and shape estimation, termed Selective sampling and Temporal Positional Encoding (STPE). Our method leverages selective sampling and advanced positional encoding to tackle the temporal complexities of video data and the high cost and scarcity of annotated datasets. Inspired by the Masked Autoencoder (MAE), our approach adopts a selective sampling strategy that efficiently captures the essential dynamics of human motion from partial views, significantly reducing reliance on continuous frames. The framework incorporates Rotary Position Embedding (RoPE), using rotational angles to simplify positional encoding. This innovation decreases model complexity and boosts learning effectiveness. We also introduce randomized index positions during training, introducing variability and enhancing generalization across various datasets and motion patterns. Our model, validated on standard datasets like 3DPW, MPI-INF-3DHP, and Human3.6M, shows enhanced performance in accurate and robust 3D pose and shape capture compared to existing methods. Our results demonstrate that strategic frame sampling and sophisticated positional encoding can significantly improve accuracy and robustness of video-based pose estimation systems.
(2025) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Yupeng Hou, Han Wen, and Guangping Zeng "Selective sampling and temporal positional encoding for monocular video-based 3D human pose and shape estimation", Proc. SPIE 13442, Fifth International Conference on Signal Processing and Computer Science (SPCS 2024), 134422D (3 January 2025); https://doi.org/10.1117/12.3054276
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Video coding

Video

Pose estimation

3D modeling

Education and training

3D video compression

Data modeling

Back to Top