Author name cluster

Changhu Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

20 papers

2 author rows

ICML Conference 2025 Conference Paper

A Generic Family of Graphical Models: Diversity, Efficiency, and Heterogeneity

Yufei Huang 0020
Changhu Wang
Junjie Tang
Weichi Wu
Ruibin Xi

Traditional network inference methods, such as Gaussian Graphical Models, which are built on continuity and homogeneity, face challenges when modeling discrete data and heterogeneous frameworks. Furthermore, under high-dimensionality, the parameter estimation of such models can be hindered by the notorious intractability of high-dimensional integrals. In this paper, we introduce a new and flexible device for graphical models, which accommodates diverse data types, including Gaussian, Poisson log-normal, and latent Gaussian copula models. The new device is driven by a new marginally recoverable parametric family, which can be effectively estimated without evaluating the high-dimensional integration in high-dimensional settings thanks to the marginal recoverability. We further introduce a mixture of marginally recoverable models to capture ubiquitous heterogeneous structures. We show the validity of the desirable properties of the models and the effective estimation methods, and demonstrate their advantages over the state-of-the-art network inference methods via extensive simulation studies and a gene regulatory network analysis of real single-cell RNA sequencing data.

Details

AAAI Conference 2025 Conference Paper

DREAM: Decoupled Discriminative Learning with Bigraph-aware Alignment for Semi-supervised 2D-3D Cross-modal Retrieval

Fan Zhang
Changhu Wang
Zebang Cheng
Xiaojiang Peng
Dongjie Wang
Yijia Xiao
Chong Chen
Xian-Sheng Hua

With the burst of big data, 2D-3D cross-modal retrieval has received increasing attention, which aims to retrieve relevant data from one modality given the query from the other modality. In this paper, we study an underexplored yet practical problem of semi-supervised 2D-3D cross-modal retrieval, which could suffer from serious label scarcity in real-world applications. Moreover, the huge heterogeneous gap could deteriorate the process of learning from unlabeled data. In this work, we propose a novel approach named Decoupled Discriminative Learning with Bigraph-aware Alignment (DREAM) for semi-supervised 2D-3D cross-modal retrieval. The core of our DREAM is to decouple the label prediction and reliability measurement processes to reduce overconfident samples in discriminative learning. In particular, we enhance a label prediction module with label propagation from labeled samples and additionally introduce a reliability measurement module to learn the scores of predicted labels. To reduce class-related bias, we compare reliability scores with class-specific adaptive thresholds to identify samples for additional learning. In addition, negative labels are estimated for unselected samples, which guides soft semantic learning to make the best use of all the information. To further minimize the heterogeneous gap, we build a bigraph graph that connects cross-modal similar examples and then conduct learning to cluster with most edges kept for alignment. Extensive experiments on several benchmark datasets validate the superiority of the proposed DREAM.

PDF Details DOI

AAAI Conference 2025 Conference Paper

TRACI: A Data-centric Approach for Multi-Domain Generalization on Graphs

Yusheng Zhao
Changhu Wang
Xiao Luo
Junyu Luo
Wei Ju
Zhiping Xiao
Ming Zhang

Graph neural networks (GNNs) have gained superior performance in graph-based prediction tasks with a variety of applications such as social analysis and drug discovery. Despite the remarkable progress, their performance often degrades on test graphs with distribution shifts. Existing domain adaptation methods rely on unlabeled test graphs during optimization, limiting their applicability to graphs in the wild. Towards this end, this paper studies the problem of multi-domain generalization on graphs, which utilizes multiple source graphs to learn a GNN with high performance on unseen target graphs. We propose a new approach named Topological Adversarial Learning with Prototypical Mixup (TRACI) to solve the problem. The fundamental principle behind our TRACI is to produce virtual adversarial and mixed graph samples from a data-centric view. In particular, TRACI enhances GNN generalization by employing a gradient-ascent strategy that considers both label prediction entropy and graph topology to craft challenging adversarial samples. Additionally, it generates domain-agnostic node representations by characterizing class-graph pair prototypes through latent distributions and applying multi-sample prototypical Mixup for distribution alignment across graphs. We further provide theoretical analysis showing that TRACI reduces the model's excess risk. Extensive experiments on various benchmark datasets demonstrate that TRACI outperforms state-of-the-art baselines, validating its effectiveness.

PDF Details DOI

AAAI Conference 2025 Conference Paper

TrackGo: A Flexible and Efficient Method for Controllable Video Generation

Haitao Zhou
Chuang Wang
Rui Nie
Jinlin Liu
Dongdong Yu
Qian Yu
Changhu Wang

Recent years have seen substantial progress in diffusion-based controllable video generation. However, achieving precise control in complex scenarios, including fine-grained object parts, sophisticated motion trajectories, and coherent background movement, remains a challenge. In this paper, we introduce *TrackGo*, a novel approach that leverages free-form masks and arrows for conditional video generation. This method offers users with a flexible and precise mechanism for manipulating video content. We also propose the *TrackAdapter* for control implementation, an efficient and lightweight adapter designed to be seamlessly integrated into the temporal self-attention layers of a pretrained video generation model. This design leverages our observation that the attention map of these layers can accurately activate regions corresponding to motion in videos. Our experimental results demonstrate that our new approach, enhanced by the TrackAdapter, achieves state-of-the-art performance on key metrics such as FVD, FID, and ObjMC scores.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

PURE: Prompt Evolution with Graph ODE for Out-of-distribution Fluid Dynamics Modeling

Hao Wu
Changhu Wang
Fan Xu
Jinbao Xue
Chong Chen
Xian-Sheng Hua
Xiao Luo

This work studies the problem of out-of-distribution fluid dynamics modeling. Previous works usually design effective neural operators to learn from mesh-based data structures. However, in real-world applications, they would suffer from distribution shifts from the variance of system parameters and temporal evolution of the dynamical system. In this paper, we propose a novel approach named \underline{P}rompt Evol\underline{u}tion with G\underline{r}aph OD\underline{E} (\method{}) for out-of-distribution fluid dynamics modeling. The core of our \method{} is to learn time-evolving prompts using a graph ODE to adapt spatio-temporal forecasting models to different scenarios. In particular, our \method{} first learns from historical observations and system parameters in the frequency domain to explore multi-view context information, which could effectively initialize prompt embeddings. More importantly, we incorporate the interpolation of observation sequences into a graph ODE, which can capture the temporal evolution of prompt embeddings for model adaptation. These time-evolving prompt embeddings are then incorporated into basic forecasting models to overcome temporal distribution shifts. We also minimize the mutual information between prompt embeddings and observation embeddings to enhance the robustness of our model to different distributions. Extensive experiments on various benchmark datasets validate the superiority of the proposed \method{} in comparison to various baselines.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Content-Variant Reference Image Quality Assessment via Knowledge Distillation

Guanghao Yin
Wei Wang
Zehuan Yuan
Chuchu Han
Wei Ji
Shouqian Sun
Changhu Wang

Generally, humans are more skilled at perceiving differences between high-quality (HQ) and low-quality (LQ) images than directly judging the quality of a single LQ image. This situation also applies to image quality assessment (IQA). Although recent no-reference (NR-IQA) methods have made great progress to predict image quality free from the reference image, they still have the potential to achieve better performance since HQ image information is not fully exploited. In contrast, full-reference (FR-IQA) methods tend to provide more reliable quality evaluation, but its practicability is affected by the requirement for pixel-level aligned reference images. To address this, we firstly propose the content-variant reference method via knowledge distillation (CVRKD-IQA). Specifically, we use non-aligned reference (NAR) images to introduce various prior distributions of high-quality images. The comparisons of distribution differences between HQ and LQ images can help our model better assess the image quality. Further, the knowledge distillation transfers more HQ-LQ distribution difference information from the FR-teacher to the NAR-student and stabilizing CVRKD-IQA performance. Moreover, to fully mine the local-global combined information, while achieving faster inference speed, our model directly processes multiple image patches from the input with the MLP-mixer. Cross-dataset experiments verify that our model can outperform all NAR/NR-IQA SOTAs, even reach comparable performance with FR-IQA methods on some occasions. Since the content-variant and non-aligned reference HQ images are easy to obtain, our model can support more IQA applications with its relative robustness to content variations. Our code and more detail elaborations of supplement are available: https: //github. com/guanghaoyin/CVRKD-IQA.

PDF Details

ICLR Conference 2022 Conference Paper

Objects in Semantic Topology

Shuo Yang 0006
Peize Sun
Yi Jiang 0009
Xiaobo Xia
Ruiheng Zhang 0001
Zehuan Yuan
Changhu Wang
Ping Luo 0002

A more realistic object detection paradigm, Open-World Object Detection, has arised increasing research interests in the community recently. A qualified open-world object detector can not only identify objects of known categories, but also discover unknown objects, and incrementally learn to categorize them when their annotations progressively arrive. Previous works rely on independent modules to recognize unknown categories and perform incremental learning, respectively. In this paper, we provide a unified perspective: Semantic Topology. During the life-long learning of an open-world object detector, all object instances from the same category are assigned to their corresponding pre-defined node in the semantic topology, including the `unknown' category. This constraint builds up discriminative feature representations and consistent relationships among objects, thus enabling the detector to distinguish unknown objects out of the known categories, as well as making learned features of known objects undistorted when learning new categories incrementally. Extensive experiments demonstrate that semantic topology, either randomly-generated or derived from a well-trained language model, could outperform the current state-of-the-art open-world object detectors by a large margin, e.g., the absolute open-set error (the number of unknown instances that are wrongly labeled as known) is reduced from 7832 to 2546, exhibiting the inherent superiority of semantic topology on open-world object detection.

Details

AAAI Conference 2022 Conference Paper

TransFG: A Transformer Architecture for Fine-Grained Recognition

Ju He
Jie-Neng Chen
Shuai Liu
Adam Kortylewski
Cheng Yang
Yutong Bai
Changhu Wang

Fine-grained visual classification (FGVC) which aims at recognizing objects from subcategories is a very challenging task due to the inherently subtle inter-class differences. Most existing works mainly tackle this problem by reusing the backbone network to extract features of detected discriminative regions. However, this strategy inevitably complicates the pipeline and pushes the proposed regions to contain most parts of the objects thus fails to locate the really important parts. Recently, vision transformer (ViT) shows its strong performance in the traditional classification task. The selfattention mechanism of the transformer links every patch token to the classification token. In this work, we first evaluate the effectiveness of the ViT framework in the fine-grained recognition setting. Then motivated by the strength of the attention link can be intuitively considered as an indicator of the importance of tokens, we further propose a novel Part Selection Module that can be applied to most of the transformer architectures where we integrate all raw attention weights of the transformer into an attention map for guiding the network to effectively and accurately select discriminative image patches and compute their relations. A contrastive loss is applied to enlarge the distance between feature representations of confusing classes. We name the augmented transformer-based model TransFG and demonstrate the value of it by conducting experiments on five popular fine-grained benchmarks where we achieve state-of-the-art performance. Qualitative results are presented for better understanding of our model.

PDF Details

NeurIPS Conference 2021 Conference Paper

Adaptive Data Augmentation on Temporal Graphs

Yiwei Wang
Yujun Cai
Yuxuan Liang
Henghui Ding
Changhu Wang
Siddharth Bhatia
Bryan Hooi

Temporal Graph Networks (TGNs) are powerful on modeling temporal graph data based on their increased complexity. Higher complexity carries with it a higher risk of overfitting, which makes TGNs capture random noise instead of essential semantic information. To address this issue, our idea is to transform the temporal graphs using data augmentation (DA) with adaptive magnitudes, so as to effectively augment the input features and preserve the essential semantic information. Based on this idea, we present the MeTA (Memory Tower Augmentation) module: a multi-level module that processes the augmented graphs of different magnitudes on separate levels, and performs message passing across levels to provide adaptively augmented inputs for every prediction. MeTA can be flexibly applied to the training of popular TGNs to improve their effectiveness without increasing their time complexity. To complement MeTA, we propose three DA strategies to realistically model noise by modifying both the temporal and topological features. Empirical results on standard datasets show that MeTA yields significant gains for the popular TGN models on edge prediction and node classification in an efficient manner.

PDF Details

NeurIPS Conference 2021 Conference Paper

Directed Graph Contrastive Learning

Zekun Tong
Yuxuan Liang
Henghui Ding
Yongxing Dai
Xinke Li
Changhu Wang

Graph Contrastive Learning (GCL) has emerged to learn generalizable representations from contrastive views. However, it is still in its infancy with two concerns: 1) changing the graph structure through data augmentation to generate contrastive views may mislead the message passing scheme, as such graph changing action deprives the intrinsic graph structural information, especially the directional structure in directed graphs; 2) since GCL usually uses predefined contrastive views with hand-picking parameters, it does not take full advantage of the contrastive information provided by data augmentation, resulting in incomplete structure information for models learning. In this paper, we design a directed graph data augmentation method called Laplacian perturbation and theoretically analyze how it provides contrastive information without changing the directed graph structure. Moreover, we present a directed graph contrastive learning framework, which dynamically learns from all possible contrastive views generated by Laplacian perturbation. Then we train it using multi-task curriculum learning to progressively learn from multiple easy-to-difficult contrastive views. We empirically show that our model can retain more structural features of directed graphs than other GCL models because of its ability to provide complete contrastive information. Experiments on various benchmarks reveal our dominance over the state-of-the-art approaches.

PDF Details

AAAI Conference 2021 Conference Paper

F2Net: Learning to Focus on the Foreground for Unsupervised Video Object Segmentation

Daizong Liu
Dongdong Yu
Changhu Wang
Pan Zhou

Although deep learning based methods have achieved great progress in unsupervised video object segmentation, difficult scenarios (e. g. , visual similarity, occlusions, and appearance changing) are still not well-handled. To alleviate these issues, we propose a novel Focus on Foreground Network (F2Net), which delves into the intra-inter frame details for the foreground objects and thus effectively improve the segmentation performance. Specifically, our proposed network consists of three main parts: Siamese Encoder Module, Center Guiding Appearance Diffusion Module, and Dynamic Information Fusion Module. Firstly, we take a siamese encoder to extract the feature representations of paired frames (reference frame and current frame). Then, a Center Guiding Appearance Diffusion Module is designed to capture the inter-frame feature (dense correspondences between reference frame and current frame), intra-frame feature (dense correspondences in current frame), and original semantic feature of current frame. Specifically, we establish a Center Prediction Branch to predict the center location of the foreground object in current frame and leverage the center point information as spatial guidance prior to enhance the inter-frame and intra-frame feature extraction, and thus the feature representation considerably focus on the foreground objects. Finally, we propose a Dynamic Information Fusion Module to automatically select relatively important features through three aforementioned different level features. Extensive experiments on DAVIS2016, Youtube-object, and FBMS datasets show that our proposed F2Net achieves the state-of-the-art performance with significant improvement.

PDF Details

AAAI Conference 2021 Conference Paper

Slimmable Generative Adversarial Networks

Liang Hou
Zehuan Yuan
Lei Huang
Huawei Shen
Xueqi Cheng
Changhu Wang

Generative adversarial networks (GANs) have achieved remarkable progress in recent years, but the continuously growing scale of models makes them challenging to deploy widely in practical applications. In particular, for real-time generation tasks, different devices require generators of different sizes due to varying computing power. In this paper, we introduce slimmable GANs (SlimGANs), which can flexibly switch the width of the generator to accommodate various quality-efficiency trade-offs at runtime. Specifically, we leverage multiple discriminators that share partial parameters to train the slimmable generator. To facilitate the consistency between generators of different widths, we present a stepwise inplace distillation technique that encourages narrow generators to learn from wide ones. As for class-conditional generation, we propose a sliceable conditional batch normalization that incorporates the label information into different widths. Our methods are validated, both quantitatively and qualitatively, by extensive experiments and a detailed ablation study.

PDF Details

ICML Conference 2021 Conference Paper

What Makes for End-to-End Object Detection?

Peize Sun
Yi Jiang 0009
Enze Xie
Wenqi Shao
Zehuan Yuan
Changhu Wang
Ping Luo 0002

Object detection has recently achieved a breakthrough for removing the last one non-differentiable component in the pipeline, Non-Maximum Suppression (NMS), and building up an end-to-end system. However, what makes for its one-to-one prediction has not been well understood. In this paper, we first point out that one-to-one positive sample assignment is the key factor, while, one-to-many assignment in previous detectors causes redundant predictions in inference. Second, we surprisingly find that even training with one-to-one assignment, previous detectors still produce redundant predictions. We identify that classification cost in matching cost is the main ingredient: (1) previous detectors only consider location cost, (2) by additionally introducing classification cost, previous detectors immediately produce one-to-one prediction during inference. We introduce the concept of score gap to explore the effect of matching cost. Classification cost enlarges the score gap by choosing positive samples as those of highest score in the training iteration and reducing noisy positive samples brought by only location cost. Finally, we demonstrate the advantages of end-to-end object detection on crowded scenes.

Details

NeurIPS Conference 2020 Conference Paper

Is normalization indispensable for training deep neural network?

Jie Shao
Kai Hu
Changhu Wang
Xiangyang Xue
Bhiksha Raj

Normalization operations are widely used to train deep neural networks, and they can improve both convergence and generalization in most tasks. The theories for normalization's effectiveness and new forms of normalization have always been hot topics in research. To better understand normalization, one question can be whether normalization is indispensable for training deep neural network? In this paper, we study what would happen when normalization layers are removed from the network, and show how to train deep neural networks without normalization layers and without performance degradation. Our proposed method can achieve the same or even slightly better performance in a variety of tasks: image classification in ImageNet, object detection and segmentation in MS-COCO, video classification in Kinetics, and machine translation in WMT English-German, etc. Our study may help better understand the role of normalization layers and can be a competitive alternative to normalization layers. Codes are available.

PDF Details

IJCAI Conference 2020 Conference Paper

Trajectory Similarity Learning with Auxiliary Supervision and Optimal Matching

Hanyuan Zhang
Xinyu Zhang
Qize Jiang
Baihua Zheng
Zhenbang Sun
Weiwei Sun
Changhu Wang

Trajectory similarity computation is a core problem in the field of trajectory data queries. However, the high time complexity of calculating the trajectory similarity has always been a bottleneck in real-world applications. Learning-based methods can map trajectories into a uniform embedding space to calculate the similarity of two trajectories with embeddings in constant time. In this paper, we propose a novel trajectory representation learning framework Traj2SimVec that performs scalable and robust trajectory similarity computation. We use a simple and fast trajectory simplification and indexing approach to obtain triplet training samples efficiently. We make the framework more robust via taking full use of the sub-trajectory similarity information as auxiliary supervision. Furthermore, the framework supports the point matching query by modeling the optimal matching relationship of trajectory points under different distance metrics. The comprehensive experiments on real-world datasets demonstrate that our model substantially outperforms all existing approaches.

PDF Details DOI

IJCAI Conference 2017 Conference Paper

MAT: A Multimodal Attentive Translator for Image Captioning

Chang Liu
Fuchun Sun
Changhu Wang
Feng Wang
Alan Yuille

In this work we formulate the problem of image captioning as a multimodal translation task. Analogous to machine translation, we present a sequence-to-sequence recurrent neural networks (RNN) model for image caption generation. Different from most existing work where the whole image is represented by convolutional neural network (CNN) feature, we propose to represent the input image as a sequence of detected objects which feeds as the source sequence of the RNN model. In this way, the sequential representation of an image can be naturally translated to a sequence of words, as the target sequence of the RNN model. To represent the image in a sequential way, we extract the objects features in the image and arrange them in a order using convolutional neural networks. To further leverage the visual information from the encoded objects, a sequential attention layer is introduced to selectively attend to the objects that are related to generate corresponding words in the sentences. Extensive experiments are conducted to validate the proposed approach on popular benchmark dataset, i. e. , MS COCO, and the proposed model surpasses the state-of-the-art methods in all metrics following the dataset splits of previous work. The proposed approach is also evaluated by the evaluation server of MS COCO captioning challenge, and achieves very competitive results, e. g. , a CIDEr of 1. 029 (c5) and 1. 064 (c40).

PDF Details

ICML Conference 2016 Conference Paper

Network Morphism

Tao Wei
Changhu Wang
Yong Rui
Chang Wen Chen

We present a systematic study on how to morph a well-trained neural network to a new one so that its network function can be completely preserved. We define this as network morphism in this research. After morphing a parent network, the child network is expected to inherit the knowledge from its parent network and also has the potential to continue growing into a more powerful one with much shortened training time. The first requirement for this network morphism is its ability to handle diverse morphing types of networks, including changes of depth, width, kernel size, and even subnet. To meet this requirement, we first introduce the network morphism equations, and then develop novel morphing algorithms for all these morphing types for both classic and convolutional neural networks. The second requirement is its ability to deal with non-linearity in a network. We propose a family of parametric-activation functions to facilitate the morphing of any continuous non-linear activation neurons. Experimental results on benchmark datasets and typical neural networks demonstrate the effectiveness of the proposed network morphism scheme.

Details

AAAI Conference 2015 Conference Paper

Building Effective Representations for Sketch Recognition

Jun Guo
Changhu Wang
Hongyang Chao

As the popularity of touch-screen devices, understanding a user’s hand-drawn sketch has become an increasingly important research topic in artificial intelligence and computer vision. However, different from natural images, the hand-drawn sketches are often highly abstract, with sparse visual information and large intraclass variance, making the problem more challenging. In this work, we study how to build effective representations for sketch recognition. First, to capture saliency patterns of different scales and spatial arrangements, a Gabor-based low-level representation is proposed. Then, based on this representation, to discovery more complex patterns in a sketch, a Hybrid Multilayer Sparse Coding (HMSC) model is proposed to learn midlevel representations. An improved dictionary learning algorithm is also leveraged in HMSC to reduce overfitting to common but trivial patterns. Extensive experiments show that the proposed representations are highly discriminative and lead to large improvements over the state of the arts.

PDF Details

IJCAI Conference 2015 Conference Paper

Offline Sketch Parsing via Shapeness Estimation

Jie Wu
Changhu Wang
Liqing Zhang
Yong Rui

In this work, we target at the problem of offline sketch parsing, in which the temporal orders of strokes are unavailable. It is more challenging than most of existing work, which usually leverages the temporal information to reduce the search space. Different from traditional approaches in which thousands of candidate groups are selected for recognition, we propose the idea of shapeness estimation to greatly reduce this number in a very fast way. Based on the observation that most of hand-drawn shapes with well-defined closed boundaries can be clearly differentiated from nonshapes if normalized into a very small size, we propose an efficient shapeness estimation method. A compact feature representation as well as its efficient extraction method is also proposed to speed up this process. Based on the proposed shapeness estimation, we present a three-stage cascade framework for offline sketch parsing. The shapeness estimation technique in this framework greatly reduces the number of false positives, resulting in a 96. 2% detection rate with only 32 candidate group proposals, which is two orders of magnitude less than existing methods. Extensive experiments show the superiority of the proposed framework over stateof-the-art works on sketch parsing in both effectiveness and efficiency, even though they leveraged the temporal information of strokes.

PDF Details

AAAI Conference 2014 Conference Paper

Sketch Recognition with Natural Correction and Editing

Jie Wu
Changhu Wang
Liqing Zhang
Yong Rui

In this paper, we target at the problem of sketch recognition. We systematically study how to incorporate users’ correction and editing into isolated and full sketch recognition. This is a natural and necessary interaction in real systems such as Visio where very similar shapes exist. First, a novel algorithm is proposed to mine the prior shape knowledge for three editing modes. Second, to differentiate visually similar shapes, a novel symbol recognition algorithm is introduced by leveraging the learnt shape knowledge. Then, a novel editing detection algorithm is proposed to facilitate symbol recognition. Furthermore, both of the symbol recognizer and the editing detector are systematically incorporated into the full sketch recognition. Finally, based on the proposed algorithms, a realtime sketch recognition system is built to recognize handdrawn flowcharts and diagrams with flexible interactions. Extensive experiments show the effectiveness of the proposed algorithms.

PDF Details