Author name cluster

Chi Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

15 papers

2 author rows

AAAI Conference 2026 Conference Paper

FashionMAC: Deformation-Free Fashion Image Generation with Fine-Grained Model Appearance Customization

Rong Zhang
Jinxiao Li
Jingnan Wang
Zhiwen Zuo
Jianfeng Dong
Wei Li
Chi Wang
Weiwei Xu

Garment-centric fashion image generation aims to synthesize realistic and controllable human models dressing a given garment, which has attracted growing interest due to its practical applications in e-commerce. The key challenges of the task lie in two aspects: (1) faithfully preserving the garment details, and (2) gaining fine-grained controllability over the model's appearance. Existing methods typically require performing garment deformation in the generation process, which often leads to garment texture distortions. Also, they fail to control the fine-grained attributes of the generated models, due to the lack of specifically designed mechanisms. To address these issues, we propose FashionMAC, a novel diffusion-based deformation-free framework that achieves high-quality and controllable fashion showcase image generation. The core idea of our framework is to eliminate the need for performing garment deformation and directly outpaint the garment segmented from a dressed person, which enables faithful preservation of the intricate garment details. Moreover, we propose a novel region-adaptive decoupled attention (RADA) mechanism along with a chained mask injection strategy to achieve fine-grained appearance controllability over the synthesized human models. Specifically, RADA adaptively predicts the generated regions for each fine-grained text attribute and enforces the text attribute to focus on the predicted regions by a chained mask injection strategy, significantly enhancing the visual fidelity and the controllability. Extensive experiments validate the superior performance of our framework compared to existing state-of-the-art methods.

PDF Details DOI

ICLR Conference 2025 Conference Paper

Steering Large Language Models between Code Execution and Textual Reasoning

Yongchao Chen
Harsh Jhamtani
Srinagesh Sharma
Chuchu Fan
Chi Wang

While a lot of recent research focuses on enhancing the textual reasoning capabilities of Large Language Models (LLMs) by optimizing the multi-agent framework or reasoning chains, several benchmark tasks can be solved with 100\% success through direct coding, which is more scalable and avoids the computational overhead associated with textual iterating and searching. Textual reasoning has inherent limitations in solving tasks with challenges in math, logics, optimization, and searching, which is unlikely to be solved by simply scaling up the model and data size. The recently released OpenAI GPT Code Interpreter and multi-agent frameworks such as AutoGen have demonstrated remarkable proficiency of integrating code generation and execution to solve complex tasks using LLMs. However, based on our experiments on 7 existing popular methods for steering code/text generation in both single- and multi-turn settings with 14 tasks and 6 types of LLMs (including the new O1-preview), currently there is no optimal method to correctly steer LLMs to write code when needed. We discover some interesting patterns on when models use code vs. textual reasoning with the evolution to task complexity and model sizes, which even result in an astonishingly inverse scaling behavior. We also discover that results from LLM written code are not always better than using textual reasoning, even if the task could be solved through code. To mitigate the above issues, we propose three methods to better steer LLM code/text generation and achieve a notable improvement. The costs of token lengths and runtime are thoroughly discussed for all the methods. We believe the problem of steering LLM code/text generation is critical for future research and has much space for further improvement. Project Page, Datasets, and Codes are available at https://yongchao98.github.io/CodeSteer/.

Details

NeurIPS Conference 2025 Conference Paper

UniTransfer: Video Concept Transfer via Progressive Spatio-Temporal Decomposition

guojun lei
Rong Zhang
Tianhang Liu
Hong Li
Zhiyuan Ma
Chi Wang
Weiwei Xu

Recent advancements in video generation models have enabled the creation of diverse and realistic videos, with promising applications in advertising and film production. However, as one of the essential tasks of video generation models, video concept transfer remains significantly challenging. Existing methods generally model video as an entirety, leading to limited flexibility and precision when solely editing specific regions or concepts. To mitigate this dilemma, we propose a novel architecture UniTransfer, which introduces both spatial and diffusion timestep decomposition in a progressive paradigm, achieving precise and controllable video concept transfer. Specifically, in terms of spatial decomposition, we decouple videos into three key components: the foreground subject, the background, and the motion flow. Building upon this decomposed formulation, we further introduce a dual-to-single-stream DiT-based architecture for supporting fine-grained control over different components in the videos. We also introduce a self-supervised pretraining strategy based on random masking to enhance the decomposed representation learning from large-scale unlabeled video data. Inspired by the Chain-of-Thought reasoning paradigm, we further revisit the denoising diffusion process and propose a Chain-of-Prompt (CoP) mechanism to achieve the timestep decomposition. We decompose the denoising process into three stages of different granularity and leverage large language models (LLMs) for stage-specific instructions to guide the generation progressively. We also curate an animal-centric video dataset called OpenAnimal to facilitate the advancement and benchmarking of research in video concept transfer. Extensive experiments demonstrate that our method achieves high-quality and controllable video concept transfer across diverse reference images and scenes, surpassing existing baselines in both visual fidelity and editability.

PDF Details

EAAI Journal 2024 Journal Article

An immune optimization deep reinforcement learning control method used for magnetorheological elastomer vibration absorber

Chi Wang
Weiheng Cheng
Hongli Zhang
Wei Dou
Jinbo Chen

Details DOI

IROS Conference 2024 Conference Paper

Rethinking 3D Geometric Object Features for Enhancing Skeleton-based Action Recognition

Yuankai Wu
Chi Wang
Driton Salihu
Constantin Patsch
Marsil Zakour
Eckehard G. Steinbach

Human action recognition is crucial for intelligent robots, especially in the realm of human-robot collaboration research. Recent advancements in human pose estimation algorithms have shifted the focus of action recognition towards skeleton-based models, which exhibit robustness to changes in background and illumination. However, many state-of-the-art action recognition models rely on 2D skeleton data, neglecting object features. This limitation becomes obvious in complex scenarios where human interactions with objects are crucial, potentially compromising the reliability of assistive robots in understanding human behavior in their environment. To address this issue, we propose a method that effectively integrates 3D geometric object features into skeleton data using graph convolutional neural networks (GCNs). In addition to analyzing the effectiveness of information from different dimensions such as object center position, category, translation, and rotation, we explore various adjacency matrix designs for graph networks. Our model performance is evaluated on two challenging datasets: IKEA ASM and Bimanual Actions. The results demonstrate a significant improvement in action recognition by integrating object features into skeleton-based models. Specifically, on the IKEA-ASM dataset, our approach achieves a frame-wise Top-1 score improvement of 10. 8% and an average F1@k improvement of 13. 3%, while on the Bimanual Actions dataset, it achieves a frame-wise Top-1 score improvement of 11. 4% and an average F1@k improvement of 5. 3%, with negligible increases in model complexity.

Details

IROS Conference 2023 Conference Paper

Model-Based Planning and Control for Terrestrial-Aerial Bimodal Vehicles with Passive Wheels

Ruibin Zhang
Junxiao Lin
Yuze Wu
Yuman Gao
Chi Wang
Chao Xu 0001
Yanjun Cao
Fei Gao 0011

Terrestrial and aerial bimodal vehicles have gained widespread attention due to their cross-domain maneuverability. Nevertheless, their bimodal dynamics significantly increase the complexity of motion planning and control, thus hindering robust and efficient autonomous navigation in unknown environments. To resolve this issue, we develop a model-based planning and control framework for terrestrial aerial bi-modal vehicles. This work begins by deriving a unified dynamic model and the corresponding differential flatness. Leveraging differential flatness, an optimization-based trajectory planner is proposed, which takes into account both solution quality and computational efficiency. Moreover, we design a tracking controller using nonlinear model predictive control based on the proposed unified dynamic model to achieve accurate trajectory tracking and smooth mode transition. We validate our framework through extensive benchmark comparisons and experiments, demonstrating its effectiveness in terms of planning quality and control performance.

Details

AAAI Conference 2022 Conference Paper

Active Boundary Loss for Semantic Segmentation

Chi Wang
Yunke Zhang
Miaomiao Cui
Peiran Ren
Yin Yang
Xuansong Xie
Xian-Sheng Hua
Hujun Bao

This paper proposes a novel active boundary loss for semantic segmentation. It can progressively encourage the alignment between predicted boundaries and ground-truth boundaries during end-to-end training, which is not explicitly enforced in commonly used cross-entropy loss. Based on the predicted boundaries detected from the segmentation results using current network parameters, we formulate the boundary alignment problem as a differentiable direction vector prediction problem to guide the movement of predicted boundaries in each iteration. Our loss is model-agnostic and can be plugged in to the training of segmentation networks to improve the boundary details. Experimental results show that training with the active boundary loss can effectively improve the boundary F-score and mean Intersection-over-Union on challenging image and video object segmentation datasets.

PDF Details

AAAI Conference 2021 Conference Paper

Frugal Optimization for Cost-related Hyperparameters

Qingyun Wu
Chi Wang
Silu Huang

The increasing demand for democratizing machine learning algorithms calls for hyperparameter optimization (HPO) solutions at low cost. Many machine learning algorithms have hyperparameters which can cause a large variation in the training cost. But this effect is largely ignored in existing HPO methods, which are incapable to properly control cost during the optimization process. To address this problem, we develop a new cost-frugal HPO solution. The core of our solution is a simple but new randomized direct-search method, for which we provide theoretical guarantees on the convergence rate and the total cost incurred to achieve convergence. We provide strong empirical results in comparison with state-of-the-art HPO methods on large AutoML benchmarks.

PDF Details

NeurIPS Conference 2020 Conference Paper

A Matrix Chernoff Bound for Markov Chains and Its Application to Co-occurrence Matrices

Jiezhong Qiu
Chi Wang
Ben Liao
Richard Peng
Jie Tang

We prove a Chernoff-type bound for sums of matrix-valued random variables sampled via a regular (aperiodic and irreducible) finite Markov chain. Specially, consider a random walk on a regular Markov chain and a Hermitian matrix-valued function on its state space. Our result gives exponentially decreasing bounds on the tail distributions of the extreme eigenvalues of the sample mean matrix. Our proof is based on the matrix expander (regular undirected graph) Chernoff bound [Garg et al. STOC '18] and scalar Chernoff-Hoeffding bounds for Markov chains [Chung et al. STACS '12]. Our matrix Chernoff bound for Markov chains can be applied to analyze the behavior of co-occurrence statistics for sequential data, which have been common and important data signals in machine learning. We show that given a regular Markov chain with n states and mixing time t, we need a trajectory of length O(t(log(n) + log(t))/e^2) to achieve an estimator of the co-occurrence matrix with error bound e. We conduct several experiments and the experimental results are consistent with the exponentially fast convergence rate from theoretical analysis. Our result gives the first bound on the convergence rate of the co-occurrence matrix and the first sample complexity analysis in graph representation learning.

PDF Details

NeurIPS Conference 2020 Conference Paper

AdaTune: Adaptive Tensor Program Compilation Made Efficient

Menghao Li
Minjia Zhang
Chi Wang
Mingqin Li

Deep learning models are computationally intense, and implementations often have to be highly optimized by experts or hardware vendors to be usable in practice. The DL compiler, together with Learning to Compile have proven to be a powerful technique for optimizing tensor programs. However, a limitation of this approach is that it still suffers from unbearably long overall optimization time. In this paper, we present a new method, called AdaTune, that significantly reduces the optimization time of tensor programs for high-performance deep learning inference. In particular, we propose an adaptive evaluation method that statistically early terminates a costly hardware measurement without losing much accuracy. We further devise a surrogate model with uncertainty quantification that allows the optimization to adapt to hardware and model heterogeneity better. Finally, we introduce a contextual optimizer that provides adaptive control of the exploration and exploitation to improve the transformation space searching effectiveness. We evaluate and compare the levels of optimization obtained by a state-of-the-art DL compiler and AdaTune. The experiment results show that AdaTune obtains up to 115% higher GFLOPS than the baseline under the same optimization time budget. Furthermore, AdaTune provides 1. 3--3. 9X speedup in optimization time over the state-of-the-art to reach the same optimization quality for a range of models across different hardware architectures.

PDF Details

AAAI Conference 2020 Conference Paper

Schema-Guided Multi-Domain Dialogue State Tracking with Graph Attention Neural Networks

Lu Chen
Boer Lv
Chi Wang
Su Zhu
Bowen Tan
Kai Yu

Dialogue state tracking (DST) aims at estimating the current dialogue state given all the preceding conversation. For multidomain DST, the data sparsity problem is also a major obstacle due to the increased number of state candidates. Existing approaches generally predict the value for each slot independently and do not consider slot relations, which may aggravate the data sparsity problem. In this paper, we propose a Schema-guided multi-domain dialogue State Tracker with graph attention networks (SST) that predicts dialogue states from dialogue utterances and schema graphs which contain slot relations in edges. We also introduce a graph attention matching network to fuse information from utterances and graphs, and a recurrent graph attention network to control state updating. Experiment results show that our approach obtains new state-of-the-art performance on both MultiWOZ 2. 0 and MultiWOZ 2. 1 benchmarks.

PDF Details

AAAI Conference 2019 Conference Paper

Efficient Identification of Approximate Best Configuration of Training in Large Datasets

Silu Huang
Chi Wang
Bolin Ding
Surajit Chaudhuri

A configuration of training refers to the combinations of feature engineering, learner, and its associated hyperparameters. Given a set of configurations and a large dataset randomly split into training and testing set, we study how to efficiently identify the best configuration with approximately the highest testing accuracy when trained from the training set. To guarantee small accuracy loss, we develop a solution using confidence interval (CI)-based progressive sampling and pruning strategy. Compared to using full data to find the exact best configuration, our solution achieves more than two orders of magnitude speedup, while the returned top configuration has identical or close test accuracy.

PDF Details

NeurIPS Conference 2017 Conference Paper

Identifying Outlier Arms in Multi-Armed Bandit

Honglei Zhuang
Chi Wang
Yifan Wang

We study a novel problem lying at the intersection of two areas: multi-armed bandit and outlier detection. Multi-armed bandit is a useful tool to model the process of incrementally collecting data for multiple objects in a decision space. Outlier detection is a powerful method to narrow down the attention to a few objects after the data for them are collected. However, no one has studied how to detect outlier objects while incrementally collecting data for them, which is necessary when data collection is expensive. We formalize this problem as identifying outlier arms in a multi-armed bandit. We propose two sampling strategies with theoretical guarantee, and analyze their sampling efficiency. Our experimental results on both synthetic and real data show that our solution saves 70-99% of data collection cost from baseline while having nearly perfect accuracy.

PDF Details

IJCAI Conference 2015 Conference Paper

Constrained Information-Theoretic Tripartite Graph Clustering to Identify Semantically Similar Relations

Chenguang Wang
Yangqiu Song
Dan Roth
Chi Wang
Jiawei Han
Heng Ji
Ming Zhang

In knowledge bases or information extraction results, differently expressed relations can be semantically similar (e. g. , (X, wrote, Y) and (X, ’s written work, Y)). Therefore, grouping semantically similar relations into clusters would facilitate and improve many applications, including knowledge base completion, information extraction, information retrieval, and more. This paper formulates relation clustering as a constrained tripartite graph clustering problem, presents an efficient clustering algorithm and exhibits the advantage of the constrained framework. We introduce several ways that provide side information via must-link and cannotlink constraints to improve the clustering results. Different from traditional semi-supervised learning approaches, we propose to use the similarity of relation expressions and the knowledge of entity types to automatically construct the constraints for the algorithm. We show improved relation clustering results on two datasets extracted from human annotated knowledge base (i. e. , Freebase) and open information extraction results (i. e. , ReVerb data).

PDF Details

IJCAI Conference 2013 Conference Paper

Large-Scale Spectral Clustering on Graphs

Jialu Liu
Chi Wang
Marina Danilevsky
Jiawei Han

Graph clustering has received growing attention in recent years as an important analytical technique, both due to the prevalence of graph data, and the usefulness of graph structures for exploiting intrinsic data characteristics. However, as graph data grows in scale, it becomes increasingly more challenging to identify clusters. In this paper we propose an efﬁcient clustering algorithm for largescale graph data using spectral methods. The key idea is to repeatedly generate a small number of “supernodes” connected to the regular nodes, in order to compress the original graph into a sparse bipartite graph. By clustering the bipartite graph using spectral methods, we are able to greatly improve efﬁciency without losing considerable clustering power. Extensive experiments show the effectiveness and efﬁciency of our approach.

PDF Details DOI