Arrow Research search

Author name cluster

Chang Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers
2 author rows

Possible papers

10

AAAI Conference 2026 Conference Paper

Extendable Planning via Multiscale Diffusion

  • Chang Chen
  • Hany Hamed
  • Doojin Baek
  • Taegu Kang
  • Samyeul Noh
  • Yoshua Bengio
  • Sungjin Ahn

Long-horizon planning is crucial in complex environments, but diffusion-based planners like Diffuser are limited by the trajectory lengths observed during training. This creates a dilemma: long trajectories are needed for effective planning, yet they degrade model performance. In this paper, we introduce this extendable long-horizon planning challenge and propose a two-phase solution. First, Progressive Trajectory Extension incrementally constructs longer trajectories through multi-round compositional stitching. Second, the Hierarchical Multiscale Diffuser enables efficient training and inference over long horizons by reasoning across temporal scales. To avoid the need for multiple separate models, we propose Adaptive Plan Pondering and the Recursive HM-Diffuser, which unify hierarchical planning within a single model. Experiments show our approach yields strong performance gains, advancing scalable and efficient decision-making over long-horizons.

IROS Conference 2025 Conference Paper

DRTT: A Diffusion-based Framework for 4DCT Generation, Robust Thoracic Registration and Tumor Deformation Tracking

  • Dongyuan Li
  • Yixin Shan
  • Yuxuan Mao
  • Haochen Shi
  • Shenghao Huang
  • Weiyan Sun
  • Chang Chen
  • Xiaojun Chen

In minimally invasive robotic thoracic surgery, the unavoidable respiratory motion of the patient causes lung lesions to move and deform, making precise tumor localiza-tion a significant challenge for surgeons. To address this, we introduce an RDDM (Recursive Deformable Diffusion Model)-based framework designed for real-time intraoperative tumor tracking, which can be used for registration and navigation in robot-assisted thoracic surgery. The RDDM reduces training complexity and enhances dataset utilization by employing a simplified DDM (Diffusion Deformable Model) iteratively, significantly lowering computational demands while maximizing the extraction of valuable information from limited 4D-CT (four-dimensional computed tomography) datasets. Considering the robustness required for intraoperative registration and navigation, we incorporate an ICP (Iterative Closest Point)-based point cloud registration method into the framework and validate our approach using publicly available datasets and volunteer trials. This innovation has the potential to reduce radiation exposure, trauma, and the risk of complications for patients undergoing minimally invasive thoracic surgery, and enables downstream tasks such as RAPNB (robot-assisted percutaneous needle biopsy) and radiation therapy.

ICML Conference 2025 Conference Paper

DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers

  • Xuanlei Zhao
  • Shenggan Cheng
  • Chang Chen
  • Zangwei Zheng
  • Ziming Liu
  • Zheming Yang
  • Yang You 0001

Scaling multi-dimensional transformers to long sequences is indispensable across various domains. However, the challenges of large memory requirements and slow speeds of such sequences necessitate sequence parallelism. All existing approaches fall under the category of embedded sequence parallelism, which are limited to shard along a single sequence dimension, thereby introducing significant communication overhead. However, the nature of multi-dimensional transformers involves independent calculations across multiple sequence dimensions. To this end, we propose Dynamic Sequence Parallelism (DSP) as a novel abstraction of sequence parallelism. DSP dynamically switches the parallel dimension among all sequences according to the computation stage with efficient resharding strategy. DSP offers significant reductions in communication costs, adaptability across modules, and ease of implementation with minimal constraints. Experimental evaluations demonstrate DSP’s superiority over state-of-the-art embedded sequence parallelism methods by remarkable throughput improvements ranging from 32. 2% to 10x, with less than 25% communication volume.

NeurIPS Conference 2025 Conference Paper

UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning

  • Ye Liu
  • Zongyang Ma
  • Junfu Pu
  • Zhongang Qi
  • Yang Wu
  • Ying Shan
  • Chang Chen

Recent advances in Large Multi-modal Models (LMMs) have demonstrated their remarkable success as general-purpose multi-modal assistants, with particular focuses on holistic image- and video-language understanding. Conversely, less attention has been given to scaling fine-grained pixel-level understanding capabilities, where the models are expected to realize pixel-level alignment between visual signals and language semantics. Some previous studies have applied LMMs to related tasks such as region-level captioning and referring expression segmentation. However, these models are limited to performing either referring or segmentation tasks independently and fail to integrate these fine-grained perception capabilities into visual reasoning. To bridge this gap, we propose UniPixel, a large multi-modal model capable of flexibly comprehending visual prompt inputs and generating mask-grounded responses. Our model distinguishes itself by seamlessly integrating pixel-level perception with general visual understanding capabilities. Specifically, UniPixel processes visual prompts and generates relevant masks on demand, and performs subsequent reasoning conditioning on these intermediate pointers during inference, thereby enabling fine-grained pixel-level reasoning. The effectiveness of our approach has been verified on 10 benchmarks across a diverse set of tasks, including pixel-level referring/segmentation and object-centric understanding in images/videos. A novel PixelQA task that jointly requires referring, segmentation, and question answering is also designed to verify the flexibility of our method.

ICML Conference 2024 Conference Paper

PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-Performer

  • Chang Chen
  • Junyeob Baek
  • Fei Deng 0001
  • Kenji Kawaguchi
  • Çaglar Gülçehre
  • Sungjin Ahn

Despite the recent advancements in offline RL, no unified algorithm could achieve superior performance across a broad range of tasks. Offline value function learning, in particular, struggles with sparse-reward, long-horizon tasks due to the difficulty of solving credit assignment and extrapolation errors that accumulates as the horizon of the task grows. On the other hand, models that can perform well in long-horizon tasks are designed specifically for goal-conditioned tasks, which commonly perform worse than value function learning methods on short-horizon, dense-reward scenarios. To bridge this gap, we propose a hierarchical planner designed for offline RL called PlanDQ. PlanDQ incorporates a diffusion-based planner at the high level, named D-Conductor, which guides the low-level policy through sub-goals. At the low level, we used a Q-learning based approach called the Q-Performer to accomplish these sub-goals. Our experimental results suggest that PlanDQ can achieve superior or competitive performance on D4RL continuous control benchmark tasks as well as AntMaze, Kitchen, and Calvin as long-horizon tasks.

ICRA Conference 2024 Conference Paper

SCALE: Self-Correcting Visual Navigation for Mobile Robots via Anti-Novelty Estimation

  • Chang Chen
  • Yuecheng Liu
  • Yuzheng Zhuang
  • Sitong Mao
  • Kaiwen Xue
  • Shunbo Zhou

Although visual navigation has been extensively studied using deep reinforcement learning, online learning for real-world robots remains a challenging task. Recent work directly learned from offline dataset to achieve broader generalization in the real-world tasks, which, however, faces the out-of-distribution (OOD) issue and potential robot localization failures in a given map for unseen observation. This significantly drops the success rates and even induces collision. In this paper, we present a self-correcting visual navigation method, SCALE, that can autonomously prevent the robot from the OOD situations without human intervention. Specifically, we develop an image-goal conditioned offline reinforcement learning method based on implicit Q-learning (IQL). When facing OOD observation, our novel localization recovery method generates the potential future trajectories by learning from the navigation affordance, and estimates the future novelty via random network distillation (RND). A tailored cost function searches for the candidates with the least novelty that can lead the robot to the familiar places. We collect offline data and conduct evaluation experiments in three real-world urban scenarios. Experiment results show that SCALE outperforms the previous state-of-the-art methods for open-world navigation with a unique capability of localization recovery, significantly reducing the need for human intervention. Code is available at https://github.com/KubeEdge4Robotics/ScaleNav.

ICLR Conference 2024 Conference Paper

Simple Hierarchical Planning with Diffusion

  • Chang Chen
  • Fei Deng 0001
  • Kenji Kawaguchi
  • Çaglar Gülçehre
  • Sungjin Ahn

Diffusion-based generative methods have proven effective in modeling trajectories with offline datasets. However, they often face computational challenges and can falter in generalization, especially in capturing temporal abstractions for long-horizon tasks. To overcome this, we introduce the Hierarchical Diffuser, a simple, fast, yet effective planning method combining the advantages of hierarchical and diffusion-based planning. Our model adopts a “jumpy” planning strategy at the high level, which allows it to have a larger receptive field but at a lower computational cost—a crucial factor for diffusion-based planning methods, as we have empirically verified. Additionally, the jumpy sub-goals guide our low-level planner, facilitating a fine-tuning stage and further improving our approach’s effectiveness. We conducted empirical evaluations on standard offline reinforcement learning benchmarks, demonstrating our method’s superior performance and efficiency in terms of training and planning speed compared to the non-hierarchical Diffuser as well as other hierarchical planning methods. Moreover, we explore our model’s generalization capability, particularly on how our method improves generalization capabilities on compositional out-of-distribution tasks.

AAAI Conference 2022 Conference Paper

Learning to Model Pixel-Embedded Affinity for Homogeneous Instance Segmentation

  • Wei Huang
  • Shiyu Deng
  • Chang Chen
  • Xueyang Fu
  • Zhiwei Xiong

Homogeneous instance segmentation aims to identify each instance in an image where all interested instances belong to the same category, such as plant leaves and microscopic cells. Recently, proposal-free methods, which straightforwardly generate instance-aware information to group pixels into different instances, have received increasing attention due to their efficient pipeline. However, they often fail to distinguish adjacent instances due to similar appearances, dense distribution and ambiguous boundaries of instances in homogeneous images. In this paper, we propose a pixel-embedded affinity modeling method for homogeneous instance segmentation, which is able to preserve the semantic information of instances and improve the distinguishability of adjacent instances. Instead of predicting affinity directly, we propose a self-correlation module to explicitly model the pairwise relationships between pixels, by estimating the similarity between embeddings generated from the input image through CNNs. Based on the self-correlation module, we further design a cross-correlation module to maintain the semantic consistency between instances. Specifically, we map the transformed input images with different views and appearances into the same embedding space, and then mutually estimate the pairwise relationships of embeddings generated from the original input and its transformed variants. In addition, to integrate the global instance information, we introduce an embedding pyramid module to model affinity on different scales. Extensive experiments demonstrate the versatile and superior performance of our method on three representative datasets. Code and models are available at https: //github. com/weih527/ Pixel-Embedded-Affinity.

NeurIPS Conference 2022 Conference Paper

TA-MoE: Topology-Aware Large Scale Mixture-of-Expert Training

  • Chang Chen
  • Min Li
  • Zhihua Wu
  • Dianhai Yu
  • Chao Yang

Sparsely gated Mixture-of-Expert (MoE) has demonstrated its effectiveness in scaling up deep neural networks to an extreme scale. Despite that numerous efforts have been made to improve the performance of MoE from the model design or system optimization perspective, existing MoE dispatch patterns are still not able to fully exploit the underlying heterogeneous network environments. In this paper, we propose TA-MoE, a topology-aware routing strategy for large-scale MoE trainging, from a model-system co-design perspective, which can dynamically adjust the MoE dispatch pattern according to the network topology. Based on communication modeling, we abstract the dispatch problem into an optimization objective and obtain the approximate dispatch pattern under different topologies. On top of that, we design a topology-aware auxiliary loss, which can adaptively route the data to fit in the underlying topology without sacrificing the model accuracy. Experiments show that TA-MoE can substantially outperform its counterparts on various hardware and model configurations, with roughly 1. 01x-1. 61x, 1. 01x-4. 77x, 1. 25x-1. 54x improvements over the popular DeepSpeed-MoE, FastMoE and FasterMoE systems.

JMLR Journal 2021 Journal Article

ROOTS: Object-Centric Representation and Rendering of 3D Scenes

  • Chang Chen
  • Fei Deng
  • Sungjin Ahn

A crucial ability of human intelligence is to build up models of individual 3D objects from partial scene observations. Recent works either achieve object-centric generation but without the ability to infer the representation, or achieve 3D scene representation learning but without object-centric compositionality. Therefore, learning to both represent and render 3D scenes with object-centric compositionality remains elusive. In this paper, we propose a probabilistic generative model for learning to build modular and compositional 3D object models from partial observations of a multi-object scene. The proposed model can (i) infer the 3D object representations by learning to search and group object areas, and also (ii) render from an arbitrary viewpoint not only individual objects but also the full scene by compositing the objects. The entire learning process is unsupervised and end-to-end. In experiments, in addition to generation quality, we also demonstrate that the learned representation permits object-wise manipulation and novel scene generation, and generalizes to various settings. Results can be found on our project website: https://sites.google.com/view/roots3d [abs] [ pdf ][ bib ] &copy JMLR 2021. ( edit, beta )