Author name cluster

Jingmin Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

2 author rows

IROS Conference 2025 Conference Paper

Stimulating Imagination: Towards General-purpose "Something Something Placement"

Jianyang Wu
Jie Gu
Xiaokang Ma
Fangzhou Qiu
Chu Tang
Jingmin Chen

General-purpose object placement is a fundamental capability of an intelligent generalist robot: being capable of rearranging objects following precise human instructions even in novel environments. This work is dedicated to achieving general-purpose object placement with "something something" instructions. Specifically, we break the entire process down into three parts, including object localization, goal imagination and robot control, and propose a method named SPORT. SPORT leverages a pre-trained large vision model for broad semantic reasoning about objects, and learns a diffusion-based pose estimator to ensure physically-realistic results in 3D space. Only object types (movable or reference) are communicated between these two parts, which brings two benefits. One is that we can fully leverage the powerful ability of open-set object recognition and localization since no specific fine-tuning is needed for the robotic scenario. Moreover, the diffusion-based estimator only need to "imagine" the object poses after the placement, while no necessity for their semantic information. Thus the training burden is greatly reduced and no massive training is required. The training data for the goal pose estimation is collected in simulation and annotated by using GPT-4. Experimental results demonstrate the effectiveness of our approach. SPORT can not only generate promising 3D goal poses for unseen simulated objects, but also be seamlessly applied to real-world settings.

Details

NeurIPS Conference 2024 Conference Paper

VideoTetris: Towards Compositional Text-to-Video Generation

Ye Tian
Ling Yang
Haotian Yang
Yuan Gao
Yufan Deng
Jingmin Chen
Xintao Wang
Zhaochen Yu

Diffusion models have demonstrated great success in text-to-video (T2V) generation. However, existing methods may face challenges when handling complex (long) video generation scenarios that involve multiple objects or dynamic changes in object numbers. To address these limitations, we propose VideoTetris, a novel framework that enables compositional T2V generation. Specifically, we propose spatio-temporal compositional diffusion to precisely follow complex textual semantics by manipulating and composing the attention maps of denoising networks spatially and temporally. Moreover, we propose a new dynamic-aware data processing pipeline and a consistency regularization method to enhance the consistency of auto-regressive video generation. Extensive experiments demonstrate that our VideoTetris achieves impressive qualitative and quantitative results in compositional T2V generation. Code is available at: https: //github. com/YangLing0818/VideoTetris

PDF Details DOI

AAAI Conference 2021 Conference Paper

Exploiting Behavioral Consistence for Universal User Representation

Jie Gu
Feng Wang
Qinghui Sun
Zhiquan Ye
Xiaoxiao Xu
Jingmin Chen
Jun Zhang

User modeling is critical for developing personalized services in industry. A common way for user modeling is to learn user representations that can be distinguished by their interests or preferences. In this work, we focus on developing universal user representation model. The obtained universal representations are expected to contain rich information, and be applicable to various downstream applications without further modifications (e. g. , user preference prediction and user profiling). Accordingly, we can be free from the heavy work of training task-specific models for every downstream task as in previous works. In specific, we propose Self-supervised User Modeling Network (SUMN) to encode behavior data into the universal representation. It includes two key components. The first one is a new learning objective, which guides the model to fully identify and preserve valuable user information under a self-supervised learning framework. The other one is a multi-hop aggregation layer, which benefits the model capacity in aggregating diverse behaviors. Extensive experiments on benchmark datasets show that our approach can outperform state-of-the-art unsupervised representation methods, and even compete with supervised ones.

PDF Details

TIST Journal 2021 Journal Article

MetaStore: A Task-adaptative Meta-learning Model for Optimal Store Placement with Multi-city Knowledge Transfer

Yan Liu
Bin Guo
Daqing Zhang
Djamal Zeghlache
Jingmin Chen
Sizhe Zhang
Dan Zhou
Xinlei Shi

Optimal store placement aims to identify the optimal location for a new brick-and-mortar store that can maximize its sale by analyzing and mining users’ preferences from large-scale urban data. In recent years, the expansion of chain enterprises in new cities brings some challenges because of two aspects: (1) data scarcity in new cities, so most existing models tend to not work (i.e., overfitting), because the superior performance of these works is conditioned on large-scale training samples; (2) data distribution discrepancy among different cities, so knowledge learned from other cities cannot be utilized directly in new cities. In this article, we propose a task-adaptative model-agnostic meta-learning framework, namely, MetaStore, to tackle these two challenges and improve the prediction performance in new cities with insufficient data for optimal store placement, by transferring prior knowledge learned from multiple data-rich cities. Specifically, we develop a task-adaptative meta-learning algorithm to learn city-specific prior initializations from multiple cities, which is capable of handling the multimodal data distribution and accelerating the adaptation in new cities compared to other methods. In addition, we design an effective learning strategy for MetaStore to promote faster convergence and optimization by sampling high-quality data for each training batch in view of noisy data in practical applications. The extensive experimental results demonstrate that our proposed method leads to state-of-the-art performance compared with various baselines.

Details DOI