Author name cluster

Enhuan Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

2 papers

1 author row

AAAI Conference 2020 Conference Paper

Temporal Pyramid Recurrent Neural Network

Qianli Ma
Zhenxi Lin
Enhuan Chen
Garrison Cottrell

Learning long-term and multi-scale dependencies in sequential data is a challenging task for recurrent neural networks (RNNs). In this paper, a novel RNN structure called temporal pyramid RNN (TP-RNN) is proposed to achieve these two goals. TP-RNN is a pyramid-like structure and generally has multiple layers. In each layer of the network, there are several sub-pyramids connected by a shortcut path to the output, which can efﬁciently aggregate historical information from hidden states and provide many gradient feedback short-paths. This avoids back-propagating through many hidden states as in usual RNNs. In particular, in the multi-layer structure of TP- RNN, the input sequence of the higher layer is a large-scale aggregated state sequence produced by the sub-pyramids in the previous layer, instead of the usual sequence of hidden states. In this way, TP-RNN can explicitly learn multi-scale dependencies with multi-scale input sequences of different layers, and shorten the input sequence and gradient feedback paths of each layer. This avoids the vanishing gradient problem in deep RNNs and allows the network to efﬁciently learn longterm dependencies. We evaluate TP-RNN on several sequence modeling tasks, including the masked addition problem, pixelby-pixel image classiﬁcation, signal recognition and speaker identiﬁcation. Experimental results demonstrate that TP-RNN consistently outperforms existing RNNs for learning long-term and multi-scale dependencies in sequential data.

PDF Details

IJCAI Conference 2017 Conference Paper

WALKING WALKing walking: Action Recognition from Action Echoes

Qianli Ma
Lifeng Shen
Enhuan Chen
Shuai Tian
Jiabing Wang
Garrison W. Cottrell

Recognizing human actions represented by 3D trajectories of skeleton joints is a challenging machine learning task. In this paper, the 3D skeleton sequences are regarded as multivariate time series, and their dynamics and multiscale features are efficiently learned from action echo states. Specifically, first the skeleton data from the limbs and trunk are projected into five high dimensional nonlinear spaces, that are randomly generated by five dynamic, training-free recurrent networks, i. e. , the reservoirs of echo state networks (ESNs). In this way, the history of the time series is represented as nonlinear echo states of actions. We then use a single multiscale convolutional layer to extract multiscale features from the echo states, and maintain multiscale temporal invariance by a max-over-time pooling layer. We propose two multi-step fusion strategies to integrate the spatial information over the five parts of the human physical structure. Finally, we learn the label distribution using softmax. With one training-free recurrent layer and only layer of convolution, our Convolutional Echo State Network (ConvESN) is a very efficient end-to-end model, and achieves state-of-the-art performance on four skeleton benchmark data sets.

PDF Details