Minghai Chen Papers

AAAI Conference 2024 Short Paper

Flow-Event Autoencoder: Event Stream Object Recognition Dataset Generation with Arbitrary High Temporal Resolution

Minghai Chen

Event camera has unique advantages in high temporal resolution and dynamic range and has shown potentials in several computer vision tasks. However, due to the novelty of this hardware, there’s a lack of large benchmark DVS event-stream datasets, including datasets for object recognition. In this work, we proposed an encoder-decoder method to augment event stream dataset from image and optical flow with arbitrary temporal resolution for object recognition task. We believe this proposed method can be generalized well in augmenting event stream vision data for object recognition and will help advance the development of event vision paradigm.

PDF Details DOI

AAAI Conference 2017 Conference Paper

Reference Based LSTM for Image Captioning

Minghai Chen
Guiguang Ding
Sicheng Zhao
Hui Chen
Qiang Liu
Jungong Han

Image captioning is an important problem in artiﬁcial intelligence, related to both computer vision and natural language processing. There are two main problems in existing methods: in the training phase, it is difﬁcult to ﬁnd which parts of the captions are more essential to the image; in the caption generation phase, the objects or the scenes are sometimes misrecognized. In this paper, we consider the training images as the references and propose a Reference based Long Short Term Memory (R-LSTM) model, aiming to solve these two problems in one goal. When training the model, we assign different weights to different words, which enables the network to better learn the key information of the captions. When generating a caption, the consensus score is utilized to exploit the reference information of neighbor images, which might ﬁx the misrecognition and make the descriptions more natural-sounding. The proposed R-LSTM model outperforms the state-of-the-art approaches on the benchmark dataset MS COCO and obtains top 2 position on 11 of the 14 metrics on the online test server.

PDF Details

Possible papers

Flow-Event Autoencoder: Event Stream Object Recognition Dataset Generation with Arbitrary High Temporal Resolution

Reference Based LSTM for Image Captioning