The key technology of modern intelligent production is the concept of human-robot interaction. Operators and robots collaborate to perform complex tasks in various scenarios in such a system. Understanding human behavior introduces challenges for collaborative robot systems to perform efficient tasks in unstructured and dynamic environments. The proposed algorithm is based on classifying multi-class human motion from multimodal visual data. For this purpose, we apply the Laplace pyramid for visible, depth, and thermal data from imaging sensors. To prevent the appearance of visual artifacts, a multiscale approach is proposed for combining images based on the Laplace pyramid and calculating the optimal weights. The next stage is fused data preprocessing by the two-sided Gabor quaternionic Fourier transform and calculating 3D local binary dense micro-block difference. Also, we obtain human skeleton data using a convolutional neural network. Based on the coordinates of the unique points of the human body skeleton and the distances between them, a descriptor is constructed that describes the person's posture on each frame and considers the time information between frames. At the next stage, the 3D local binary dense micro-block difference and skeletal descriptor are combined into a single feature vector. Moreover, finally, this descriptor is classified by a neural network. We present simulation modeling on the effectiveness of the proposed action recognition algorithm in the RoboGuid environment.
|