Author name cluster

Chong Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

32 papers

2 author rows

EAAI Journal 2026 Journal Article

Beyond reconstruction: Enhancing masked autoencoders with contrastive learning for video representation learning

Yawei Feng
Lijun Guo
Guitao Yu
Rong Zhang
Jiangbo Qian
Chong Wang
Shangce Gao

Self-supervised video representation learning primarily employs two methods: contrastive learning and masked video modeling, both of which possess unique advantages. Some studies have attempted to combine these two approaches to fully leverage their respective strengths. However, the intrinsic heterogeneity of these two methods poses challenges for existing models in integrating them, including complex model architectures, difficult training processes, and limited performance gains. To address these issues, this study proposes a novel video pre-training framework called Beyond Reconstruction (BR), which introduces a dual-track heterogeneous learning strategy. This strategy enables contrastive learning and masked video modeling to play their unique roles in different layers of Vision Transformers (ViTs), seamlessly integrating them into a unified framework to enhance the quality of video representations. Additionally, BR incorporates a motion-aware progressive masking strategy to strengthen spatiotemporal saliency modeling and stabilize the training process. By leveraging the advantages of contrastive learning in capturing global spatial motion objects, this strategy overcomes the limitations of previous masking methods. Experiments on multiple benchmarks, including action recognition and video object segmentation, show that the BR method achieves performance comparable to or even better than existing approaches under both fine-tuning and linear probing settings. These results demonstrate BR’s strong adaptability and efficiency in practical deployment: its stable fine-tuning performance enables effective adaptation to complex scenarios with limited annotations, while its strong linear probing capability allows the backbone to remain frozen, facilitating shared usage across multiple tasks and reducing overall computational cost without compromising performance.

Details DOI

JBHI Journal 2026 Journal Article

Deep Learning-Based Vitiligo Activity Evaluation Using Wood's Lamp Imaging: A Clinical Decision Support

Zheng Wang
Zixuan Nie
Hui Hu
Kaibin Lin
Chong Wang
Hongyang Fu
Jianglin Zhang

Vitiligo is an autoimmune disorder characterized by heterogeneous and unpredictable depigmentation, which poses substantial challenges for objective disease monitoring and treatment evaluation in routine clinical practice. To address the lack of automated and quantitative follow-up tools, we designed and validated an end-to-end deep learning–based system using Wood's lamp imaging to support lesion localization, longitudinal tracking, and activity assessment. The framework employs Mask R-CNN for automated lesion detection and segmentation, followed by quantitative pigmentation-state analysis and disease activity evaluation using t distributed stochastic neighbor embedding (t SNE) and the Vitiligo Disease Activity (VIDA) score. Disease progression is further characterized through risk, correlation, and survival analyses. Quantitatively, the Mask R CNN model achieved robust performance, with a Dice coefficient of 90. 5%, a mean intersection over-union of 83. 1%, and an area under the curve of 0. 9567 on external validation, supporting reliable lesion delineation under Wood's lamp imaging. When benchmarked against representative segmentation backbones (U-Net, U-Net++, U2-Net, and DeepLabV3), the Mask R CNN–based pipeline yielded the highest mean IoU (83. 1 ± 3. 9%) while maintaining a competitive Dice score (90. 5 ± 1. 38%). t-SNE analysis effectively separated pigmentation states, revealing distinct patterns of hyperpigmentation, hypopigmentation, and apigmentation that were consistent with clinical activity. Longitudinal survival analysis demonstrated that stable lesions derived significant benefit from sustained treatment, whereas active lesions exhibited elevated early risk. Age-stratified analysis identified the highest relapse risk in patients aged 11–20 years, while patients aged 71–80 years showed reduced disease stability. Regression analysis further indicated that, during the stable phase, increased associated with hypopigmentation subsequent was associated with subsequent pigment regeneration, underscoring stage-dependent treatment effects. Integrating deep learning based lesion analysis with clinically grounded activity and longitudinal risk modeling enables precise, objective monitoring of vitiligo and supports personalized, stage-aware treatment decision-making.

Details DOI

JBHI Journal 2026 Journal Article

Subject-Adaptive EEG Decoding via Filter-Bank Neural Architecture Search for BCI Applications

Chong Wang
Li Yang
Bingfan Yuan
Jiafan Zhang
Chen Jin
Rong Li
Junjie Bu

Individual differences pose a significant challenge in brain-computer interface (BCI) research. Designing a universally applicable network architecture is impractical due to the variability in human brain structure and function. We propose Filter-Bank Neural Architecture Search (FBNAS), an EEG decoding framework that automates network architecture design for individuals. FBNAS uses three temporal cells to process different frequency EEG signals, with dilated convolution kernels in their search spaces. A multi-path NAS algorithm determines optimal architectures for multi-scale feature extraction. We benchmarked FBNAS on three EEG datasets across two BCI paradigms, comparing it to six state-of-the-art deep learning algorithms. FBNAS achieved cross-session decoding accuracies of 79. 78%, 70. 66%, and 68. 38% on the BCIC-IV-2a, OpenBMI, and SEED datasets, respectively, outperforming other methods. Our results show that FBNAS customizes decoding models to address individual differences, enhancing decoding performance and shifting model design from expert-driven to machine-aided. The source code can be found at https://github.com/wang1239435478/FBNAS-master.

Details DOI

AIIM Journal 2025 Journal Article

A survey for large language models in biomedicine

Chong Wang
Mengyao Li
Junjun He
Zhongruo Wang
Erfan Darzi
Zan Chen
Jin Ye
Tianbin Li

Recent breakthroughs in large language models (LLMs) offer unprecedented natural language understanding and generation capabilities. However, existing surveys on LLMs in biomedicine often focus on specific applications or model architectures, lacking a comprehensive analysis that integrates the latest advancements across various biomedical domains. This review, based on an analysis of 484 publications sourced from databases including PubMed, Web of Science, and arXiv, provides an in-depth examination of the current landscape, applications, challenges, and prospects of LLMs in biomedicine, distinguishing itself by focusing on the practical implications of these models in real-world biomedical contexts. Firstly, we explore the capabilities of LLMs in zero-shot learning across a broad spectrum of biomedical tasks, including diagnostic assistance, drug discovery, and personalized medicine, among others, with insights drawn from 137 key studies. Then, we discuss adaptation strategies of LLMs, including fine-tuning methods for both uni-modal and multi-modal LLMs to enhance their performance in specialized biomedical contexts where zero-shot fails to achieve, such as medical question answering and efficient processing of biomedical literature. Finally, we discuss the challenges that LLMs face in the biomedicine domain including data privacy concerns, limited model interpretability, issues with dataset quality, and ethics due to the sensitive nature of biomedical data, the need for highly reliable model outputs, and the ethical implications of deploying AI in healthcare. To address these challenges, we also identify future research directions of LLM in biomedicine including federated learning methods to preserve data privacy and integrating explainable AI methodologies to enhance the transparency of LLMs. As this field of LLM rapidly evolves, continued research and development are essential to fully harness the capabilities of LLMs in biomedicine while ensuring their responsible and effective deployment.

Details DOI

ICLR Conference 2025 Conference Paper

Animate Your Thoughts: Reconstruction of Dynamic Natural Vision from Human Brain Activity

Yizhuo Lu
Changde Du
Chong Wang
Xuanliu Zhu
Liuyun Jiang
Xujin Li
Huiguang He

Reconstructing human dynamic vision from brain activity is a challenging task with great scientific significance. Although prior video reconstruction methods have made substantial progress, they still suffer from several limitations, including: (1) difficulty in simultaneously reconciling semantic (e.g. categorical descriptions), structure (e.g. size and color), and consistent motion information (e.g. order of frames); (2) low temporal resolution of fMRI, which poses a challenge in decoding multiple frames of video dynamics from a single fMRI frame; (3) reliance on video generation models, which introduces ambiguity regarding whether the dynamics observed in the reconstructed videos are genuinely derived from fMRI data or are hallucinations from generative model. To overcome these limitations, we propose a two-stage model named Mind-Animator. During the fMRI-to-feature stage, we decouple semantic, structure, and motion features from fMRI. Specifically, we employ fMRI-vision-language tri-modal contrastive learning to decode semantic feature from fMRI and design a sparse causal attention mechanism for decoding multi-frame video motion features through a next-frame-prediction task. In the feature-to-video stage, these features are integrated into videos using an inflated Stable Diffusion, effectively eliminating external video data interference. Extensive experiments on multiple video-fMRI datasets demonstrate that our model achieves state-of-the-art performance. Comprehensive visualization analyses further elucidate the interpretability of our model from a neurobiological perspective. Project page: https://mind-animator-design.github.io/.

Details

ICML Conference 2025 Conference Paper

CommVQ: Commutative Vector Quantization for KV Cache Compression

Junyan Li
Yang Zhang 0001
Muhammad Yusuf Hassan
Talha Chafekar
Tianle Cai
Zhile Ren
Pengsheng Guo
Foroozan Karimzadeh

Large Language Models (LLMs) are increasingly used in applications requiring long context lengths, but the key-value (KV) cache often becomes a memory bottleneck on GPUs as context grows. To address this, we propose Commutative Vector Quantization (CommVQ) to significantly reduce memory usage for long-context LLM inference. We first introduce additive quantization with a lightweight encoder and codebook to compress the KV cache, which can be decoded via simple matrix multiplication. To further reduce computational costs during decoding, we design the codebook to be commutative with Rotary Position Embedding (RoPE) and train it using an Expectation-Maximization (EM) algorithm. This enables efficient integration of decoding into the self-attention mechanism. Our approach achieves high accuracy with additive quantization and low overhead via the RoPE-commutative codebook. Experiments on long-context benchmarks and GSM8K show that our method reduces FP16 KV cache size by 87. 5% with 2-bit quantization, while outperforming state-of-the-art KV cache quantization methods. Notably, it enables 1-bit KV cache quantization with minimal accuracy loss, allowing a LLaMA-3. 1 8B model to run with a 128K context length on a single RTX 4090 GPU. The source code is available at: https: //github. com/UMass-Embodied-AGI/CommVQ.

Details

EAAI Journal 2025 Journal Article

Cooperative traffic signal control for a partially observed vehicular network using multi-agent reinforcement learning

Chong Wang
Yueqi Li
Jiale Chen
Jian Zhang
Yu Xue

Details DOI

AAAI Conference 2025 Conference Paper

Differential Private Stochastic Optimization with Heavy-tailed Data: Towards Optimal Rates

Puning Zhao
Jiafei Wu
Zhe Liu
Chong Wang
Rongfei Fan
Qingming Li

We study convex optimization problems under differential privacy (DP). With heavy-tailed gradients, existing works achieve suboptimal rates. The main obstacle is that existing gradient estimators have suboptimal tail property, resulting in a superfluous factor of d in the union bound. In this paper, we explore algorithms achieving optimal rates of DP optimization with heavy-tailed gradients. Our first method is a simple clipping approach. Under bounded p-th order moments of gradients, with n samples, it achieves minimax optimal population risk with epsilon less than 1/d. We then propose an iterative updating method, which is more complex but achieves this rate for all epsilon smaller than 1. The results significantly improve over existing methods. Such improvement relies on a careful treatment of the tail behavior of gradient estimators. Our results match the minimax lower bound, indicating that the theoretical limit of stochastic convex optimization under DP is achievable.

PDF Details DOI

ICML Conference 2025 Conference Paper

Instruction-Following Pruning for Large Language Models

Bairu Hou
Qibin Chen
Jianyu Wang
Guoli Yin
Chong Wang
Nan Du 0002
Ruoming Pang
Shiyu Chang

With the rapid scaling of large language models (LLMs), structured pruning has become a widely used technique to learn efficient, smaller models from larger ones, delivering superior performance compared to training similarly sized models from scratch. In this paper, we move beyond the traditional static pruning approach of determining a fixed pruning mask for a model, and propose a dynamic approach to structured pruning. In our method, the pruning mask is input-dependent and adapts dynamically based on the information described in a user instruction. Our approach, termed "instruction-following pruning”, introduces a sparse mask predictor that takes the user instruction as input and dynamically selects the most relevant model parameters for the given task. To identify and activate effective parameters, we jointly optimize the sparse mask predictor and the LLM, leveraging both instruction-following data and the pre-training corpus. Experimental results demonstrate the effectiveness of our approach on a wide range of evaluation benchmarks. For example, our 3B activated model improves over the 3B dense model by 5-8 points of absolute margin on domains such as math and coding, and rivals the performance of a 9B model.

Details

AAAI Conference 2025 Conference Paper

J&H: Evaluating the Robustness of Large Language Models Under Knowledge-Injection Attacks in Legal Domain

Yiran Hu
Huanghai Liu
Qingjing Chen
Ning Zheng
Chong Wang
Yun Liu
Charles L. A. Clarke
Weixing Shen

As the scale and capabilities of Large Language Models (LLMs) increase, their applications in knowledge-intensive fields such as legal domain have garnered widespread attention. However, it remains doubtful whether these LLMs make judgments based on domain knowledge for reasoning. If LLMs base their judgments solely on specific words or patterns, rather than on the underlying logic of the language, the “LLM-as-judges” paradigm poses substantial risks in the real-world applications. To address this question, we propose a method of legal knowledge injection attacks for robustness testing, thereby inferring whether LLMs have learned legal knowledge and reasoning logic. In this paper, we propose J&H: an evaluation framework for detecting the robustness of LLMs under knowledge injection attacks in the legal domain. The aim of the framework is to explore whether LLMs perform deductive reasoning when accomplishing legal tasks. To further this aim, we have attacked each part of the reasoning logic underlying these tasks (major premise, minor premise, and conclusion generation). We have collected mistakes that legal experts might make in judicial decisions in the real world, such as typos, legal synonyms, inaccurate external legal statutes retrieval. However, in real legal practice, legal experts tend to overlook these mistakes and make judgments based on logic. However, when faced with these errors, LLMs are likely to be misled by typographical errors and may not utilize logic in their judgments. We conducted knowledge injection attacks on existing general and domain-specific LLMs. Current LLMs are not robust against the attacks employed in our experiments. In addition we propose and compare several methods to enhance the knowledge robustness of LLMs. All code can be found at the link.

PDF Details DOI

JBHI Journal 2025 Journal Article

Progressive Mining and Dynamic Distillation of Hierarchical Prototypes for Disease Classification and Localisation

Chong Wang
Fengbei Liu
Yuanhong Chen
Chun Fung Kwok
Michael Elliott
Carlos Peña-Solorzano
Davis James McCarthy
Helen Frazer

Constructing effective representation of lesions is essential for disease classification and localization in medical image analysis. Prototype-based models address this by leveraging visual prototypes to capture representative lesion patterns, yet effectively handling the complexity of diverse lesion characteristics remains a critical challenge, as they typically rely on single-level, fixed-size prototypes and suffer from prototype redundancy. In this paper, we present HierProtoPNet, a new prototype-based framework designed to handle the complexity of lesions in medical images. HierProtoPNet leverages hierarchical visual prototypes across different semantic feature granularities to effectively capture diverse lesion patterns. To prevent redundancy and increase utility of the prototypes, we devise a novel prototype mining paradigm to progressively discover semantically distinct prototypes, offering multi-level complementary analysis of lesions. Also, we introduce a dynamic knowledge distillation strategy that allows transferring essential classification information across hierarchical levels, thereby improving generalisation performance. Comprehensive experiments show that HierProtoPNet achieves state-of-the-art classification performances in three benchmarks: binary breast cancer screening, multi-class retinal disease diagnosis, and multi-label chest X-ray classification. Quantitative assessments also illustrate HierProtoPNet's significant advantages in weakly-supervised disease localisation and segmentation.

Details DOI

ICLR Conference 2025 Conference Paper

Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo

Shengyu Feng
Xiang Kong
Shuang Ma
Aonan Zhang
Dong Yin
Chong Wang
Ruoming Pang
Yiming Yang 0002

Augmenting the multi-step reasoning abilities of Large Language Models (LLMs) has been a persistent challenge. Recently, verification has shown promise in improving solution consistency by evaluating generated outputs. However, current verification approaches suffer from sampling inefficiencies, requiring a large number of samples to achieve satisfactory performance. Additionally, training an effective verifier often depends on extensive process supervision, which is costly to acquire. In this paper, we address these limitations by introducing a novel verification method based on Twisted Sequential Monte Carlo (TSMC). TSMC sequentially refines its sampling effort to focus exploration on promising candidates, resulting in more efficient generation of high-quality solutions. We apply TSMC to LLMs by estimating the expected future rewards at partial solutions. This approach results in a more straightforward training target that eliminates the need for step-wise human annotations. We empirically demonstrate the advantages of our method across multiple math benchmarks, and also validate our theoretical analysis of both our approach and existing verification methods.

Details

AAAI Conference 2025 Conference Paper

UCF-Crime-DVS: A Novel Event-Based Dataset for Video Anomaly Detection with Spiking Neural Networks

Yuanbin Qian
Shuhan Ye
Chong Wang
Xiaojie Cai
Jiangbo Qian
Jiafei Wu

Video anomaly detection plays a significant role in intelligent surveillance systems. To enhance model's anomaly recognition ability, previous works have typically involved RGB, optical flow, and text features. Recently, dynamic vision sensors (DVS) have emerged as a promising technology, which capture visual information as discrete events with a very high dynamic range and temporal resolution. It reduces data redundancy and enhances the capture capacity of moving objects compared to conventional camera. To introduce this rich dynamic information into the surveillance field, we created the first DVS video anomaly detection benchmark, namely UCF-Crime-DVS. To fully utilize this new data modality, a multi-scale spiking fusion network (MSF) is designed based on spiking neural networks (SNNs). This work explores the potential application of dynamic information from event data in video anomaly detection. Our experiments demonstrate the effectiveness of our framework on UCF-Crime-DVS and its superior performance compared to other models, establishing a new baseline for SNN-based weakly supervised video anomaly detection.

PDF Details DOI

JBHI Journal 2024 Journal Article

RandStainNA++: Enhance Random Stain Augmentation and Normalization Through Foreground and Background Differentiation

Chong Wang
Shuxin Li
Jing Ke
Chen Zhang
Yiqing Shen

The wide prevalence of staining variations in digital pathology presents a significant obstacle, often undermining the effectiveness of diagnosis and analysis. The current strategies to counteract this issue primarily revolve around Stain Normalization (SN) and Stain Augmentation (SA). Nonetheless, these methodologies come with inherent limitations. They struggle to adapt to the vast array of staining styles, tend to presuppose linear associations between color spaces, and often lead to unrealistic color transformations. In response to these challenges, we introduce RandStainNA++, a novel method seamlessly integrating SN and SA. This method exploits the versatility of random SN and SA within randomly selected color spaces, effectively managing variations for the foreground and background independently. By refining the transformations of staining styles for the foreground and background within a realistic scope, this strategy promotes the generation of more practical staining transformations during the training phase. Further enhancing our approach, we propose a unique self-distillation method. This technique incorporates prior knowledge of stain variation, substantially augmenting the generalization capability of the network. The striking results yield that, compared to conventional classification models, our method boosts performance by a significant margin of 16-25%. Furthermore, when juxtaposed with baseline segmentation models, the Dice score registers an increase of 0. 06.

Details DOI

EAAI Journal 2024 Journal Article

Resilient dynamic microgrid formation by deep reinforcement learning integrating physics-informed neural networks

Mingze Xu
Shunbo Lei
Chong Wang
Liang Liang
Junhua Zhao
Chaoyi Peng

Details DOI

AAAI Conference 2023 Conference Paper

DPAUC: Differentially Private AUC Computation in Federated Learning

Jiankai Sun
Xin Yang
Yuanshun Yao
Junyuan Xie
Di Wu
Chong Wang

Federated learning (FL) has gained significant attention recently as a privacy-enhancing tool to jointly train a machine learning model by multiple participants. The prior work on FL has mostly studied how to protect label privacy during model training. However, model evaluation in FL might also lead to the potential leakage of private label information. In this work, we propose an evaluation algorithm that can accurately compute the widely used AUC (area under the curve) metric when using the label differential privacy (DP) in FL. Through extensive experiments, we show our algorithms can compute accurate AUCs compared to the ground truth. The code is available at https://github.com/bytedance/fedlearner/tree/master/example/privacy/DPAUC

PDF Details DOI

ICLR Conference 2023 Conference Paper

Efficient Attention via Control Variates

Lin Zheng
Jianbo Yuan
Chong Wang
Lingpeng Kong

Random-feature-based attention (RFA) is an efficient approximation of softmax attention with linear runtime and space complexity. However, the approximation gap between RFA and conventional softmax attention is not well studied. Built upon previous progress of RFA, we characterize this gap through the lens of control variates and show that RFA can be decomposed into a sum of multiple control variate estimators for each element in the sequence. This new framework reveals that exact softmax attention can be recovered from RFA by manipulating each control variate. Besides, it allows us to develop a more flexible form of control variates, resulting in a novel attention mechanism that significantly reduces the approximation gap while maintaining linear complexity. Extensive experiments demonstrate that our model outperforms state-of-the-art efficient attention mechanisms on both vision and language tasks.

Details

ICML Conference 2022 Conference Paper

Fourier Learning with Cyclical Data

Yingxiang Yang
Zhihan Xiong
Tianyi Liu
Taiqing Wang
Chong Wang

Many machine learning models for online applications, such as recommender systems, are often trained on data with cyclical properties. These data sequentially arrive from a time-varying distribution that is periodic in time. Existing algorithms either use streaming learning to track a time-varying set of optimal model parameters, yielding a dynamic regret that scales linearly in time; or partition the data of each cycle into multiple segments and train a separate model for each—a pluralistic approach that is computationally and storage-wise expensive. In this paper, we have designed a novel approach to overcome the aforementioned shortcomings. Our method, named "Fourier learning", encodes the periodicity into the model representation using a partial Fourier sequence, and trains the coefficient functions modeled by neural networks. Particularly, we design a Fourier multi-layer perceptron (F-MLP) that can be trained on streaming data with stochastic gradient descent (streaming-SGD), and we derive its convergence guarantees. We demonstrate Fourier learning’s better performance with extensive experiments on synthetic and public datasets, as well as on a large-scale recommender system that is updated in real-time, and trained with tens of millions of samples per day.

Details

ICML Conference 2022 Conference Paper

Linear Complexity Randomized Self-attention Mechanism

Lin Zheng
Chong Wang
Lingpeng Kong

Recently, random feature attentions (RFAs) are proposed to approximate the softmax attention in linear time and space complexity by linearizing the exponential kernel. In this paper, we first propose a novel perspective to understand the bias in such approximation by recasting RFAs as self-normalized importance samplers. This perspective further sheds light on an unbiased estimator for the whole softmax attention, called randomized attention (RA). RA constructs positive random features via query-specific distributions and enjoys greatly improved approximation fidelity, albeit exhibiting quadratic complexity. By combining the expressiveness in RA and the efficiency in RFA, we develop a novel linear complexity self-attention mechanism called linear randomized attention (LARA). Extensive experiments across various domains demonstrate that RA and LARA significantly improve the performance of RFAs by a substantial margin.

Details

NeurIPS Conference 2021 Conference Paper

Nonuniform Negative Sampling and Log Odds Correction with Rare Events Data

Haiying Wang
Aonan Zhang
Chong Wang

We investigate the issue of parameter estimation with nonuniform negative sampling for imbalanced data. We first prove that, with imbalanced data, the available information about unknown parameters is only tied to the relatively small number of positive instances, which justifies the usage of negative sampling. However, if the negative instances are subsampled to the same level of the positive cases, there is information loss. To maintain more information, we derive the asymptotic distribution of a general inverse probability weighted (IPW) estimator and obtain the optimal sampling probability that minimizes its variance. To further improve the estimation efficiency over the IPW method, we propose a likelihood-based estimator by correcting log odds for the sampled data and prove that the improved estimator has the smallest asymptotic variance among a large class of estimators. It is also more robust to pilot misspecification. We validate our approach on simulated data as well as a real click-through rate dataset with more than 0. 3 trillion instances, collected over a period of a month. Both theoretical and empirical results demonstrate the effectiveness of our method.

PDF Details

AAAI Conference 2019 Conference Paper

Robust Deep Co-Saliency Detection with Group Semantic

Chong Wang
Zheng-Jun Zha
Dong Liu
Hongtao Xie

High-level semantic knowledge in addition to low-level visual cues is essentially crucial for co-saliency detection. This paper proposes a novel end-to-end deep learning approach for robust co-saliency detection by simultaneously learning highlevel group-wise semantic representation as well as deep visual features of a given image group. The inter-image interaction at semantic-level as well as the complementarity between group semantics and visual features are exploited to boost the inferring of co-salient regions. Specifically, the proposed approach consists of a co-category learning branch and a co-saliency detection branch. While the former is proposed to learn group-wise semantic vector using co-category association of an image group as supervision, the latter is to infer precise co-salient maps based on the ensemble of group semantic knowledge and deep visual cues. The group semantic vector is broadcasted to each spatial location of multi-scale visual feature maps and is used as a top-down semantic guidance for boosting the bottom-up inferring of co-saliency. The co-category learning and co-saliency detection branches are jointly optimized in a multi-task learning manner, further improving the robustness of the approach. Moreover, we construct a new large-scale co-saliency dataset COCO-SEG to facilitate research of co-saliency detection. Extensive experimental results on COCO-SEG and a widely used benchmark Cosal2015 have demonstrated the superiority of the proposed approach as compared to the state-of-the-art methods.

PDF Details

NeurIPS Conference 2017 Conference Paper

Q-LDA: Uncovering Latent Patterns in Text-based Sequential Decision Processes

Jianshu Chen
Chong Wang
Lin Xiao
Ji He
Lihong Li
Li Deng

In sequential decision making, it is often important and useful for end users to understand the underlying patterns or causes that lead to the corresponding decisions. However, typical deep reinforcement learning algorithms seldom provide such information due to their black-box nature. In this paper, we present a probabilistic model, Q-LDA, to uncover latent patterns in text-based sequential decision processes. The model can be understood as a variant of latent topic models that are tailored to maximize total rewards; we further draw an interesting connection between an approximate maximum-likelihood estimation of Q-LDA and the celebrated Q-learning algorithm. We demonstrate in the text-game domain that our proposed method not only provides a viable mechanism to uncover latent patterns in decision processes, but also obtains state-of-the-art rewards in these games.

PDF Details

JMLR Journal 2013 Journal Article

Bayesian Canonical Correlation Analysis

Chong Wang
David M. Blei

Mean-field variational methods are widely used for approximate posterior inference in many probabilistic models. In a typical application, mean-field methods approximately compute the posterior with a coordinate-ascent optimization algorithm. When the model is conditionally conjugate, the coordinate updates are easily derived and in closed form. However, many models of interest---like the correlated topic model and Bayesian logistic regression---are nonconjugate. In these models, mean-field methods cannot be directly applied and practitioners have had to develop variational algorithms on a case-by-case basis. In this paper, we develop two generic methods for nonconjugate models, Laplace variational inference and delta method variational inference. Our methods have several advantages: they allow for easily derived variational algorithms with a wide class of nonconjugate models; they extend and unify some of the existing algorithms that have been derived for specific models; and they work well on real-world data sets. We studied our methods on the correlated topic model, Bayesian logistic regression, and hierarchical Bayesian logistic regression. [abs] [ pdf ][ bib ] &copy JMLR 2013. ( edit, beta )

PDF Details

NeurIPS Conference 2013 Conference Paper

Modeling Overlapping Communities with Node Popularities

Prem Gopalan
Chong Wang
David Blei

We develop a probabilistic approach for accurate network modeling using node popularities within the framework of the mixed-membership stochastic blockmodel (MMSB). Our model integrates some of the basic properties of nodes in social networks: homophily and preferential connection to popular nodes. We develop a scalable algorithm for posterior inference, based on a novel nonconjugate variant of stochastic variational inference. We evaluate the link prediction accuracy of our algorithm on eight real-world networks with up to 60, 000 nodes, and 24 benchmark networks. We demonstrate that our algorithm predicts better than the MMSB. Further, using benchmark networks we show that node popularities are essential to achieving high accuracy in the presence of skewed degree distribution and noisy links---both characteristics of real networks.

PDF Details

JMLR Journal 2013 Journal Article

Stochastic Variational Inference

Matthew D. Hoffman
David M. Blei
Chong Wang
John Paisley

We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet process topic model. Using stochastic variational inference, we analyze several large collections of documents: 300K articles from Nature, 1.8M articles from The New York Times, and 3.8M articles from Wikipedia. Stochastic inference can easily handle data sets of this size and outperforms traditional variational inference, which can only handle a smaller subset. (We also show that the Bayesian nonparametric topic model outperforms its parametric counterpart.) Stochastic variational inference lets us apply complex Bayesian models to massive data sets. [abs] [ pdf ][ bib ] &copy JMLR 2013. ( edit, beta )

PDF Details

NeurIPS Conference 2013 Conference Paper

Variance Reduction for Stochastic Gradient Optimization

Chong Wang
Xi Chen
Alexander Smola
Eric Xing

Stochastic gradient optimization is a class of widely used algorithms for training machine learning models. To optimize an objective, it uses the noisy gradient computed from the random data samples instead of the true gradient computed from the entire dataset. However, when the variance of the noisy gradient is large, the algorithm might spend much time bouncing around, leading to slower convergence and worse performance. In this paper, we develop a general approach of using control variate for variance reduction in stochastic gradient. Data statistics such as low-order moments (pre-computed or estimated online) is used to form the control variate. We demonstrate how to construct the control variate for two practical problems using stochastic gradient optimization. One is convex---the MAP estimation for logistic regression, and the other is non-convex---stochastic variational inference for latent Dirichlet allocation. On both problems, our approach shows faster convergence and better performance than the classical approach.

PDF Details

ICML Conference 2012 Conference Paper

Latent Collaborative Retrieval

Jason Weston
Chong Wang
Ron J. Weiss
Adam Berenzweig

Details

NeurIPS Conference 2012 Conference Paper

Truncation-free Online Variational Inference for Bayesian Nonparametric Models

Chong Wang
David Blei

We present a truncation-free online variational inference algorithm for Bayesian nonparametric models. Unlike traditional (online) variational inference algorithms that require truncations for the model or the variational distribution, our method adapts model complexity on the fly. Our experiments for Dirichlet process mixture models and hierarchical Dirichlet process topic models on two large-scale data sets show better performance than previous online variational inference algorithms.

PDF Details

NeurIPS Conference 2010 Conference Paper

Fast Large-scale Mixture Modeling with Component-specific Data Partitions

Bo Thiesson
Chong Wang

Remarkably easy implementation and guaranteed convergence has made the EM algorithm one of the most used algorithms for mixture modeling. On the downside, the E-step is linear in both the sample size and the number of mixture components, making it impractical for large-scale data. Based on the variational EM framework, we propose a fast alternative that uses component-specific data partitions to obtain a sub-linear E-step in sample size, while the algorithm still maintains provable convergence. Our approach builds on previous work, but is significantly faster and scales much better in the number of mixture components. We demonstrate this speedup by experiments on large-scale synthetic and real data.

PDF Details

NeurIPS Conference 2009 Conference Paper

Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

Chong Wang
David Blei

We present a nonparametric hierarchical Bayesian model of document collections that decouples sparsity and smoothness in the component distributions (i. e. , the ``topics). In the sparse topic model (STM), each topic is represented by a bank of selector variables that determine which terms appear in the topic. Thus each topic is associated with a subset of the vocabulary, and topic smoothness is modeled on this subset. We develop an efficient Gibbs sampler for the STM that includes a general-purpose method for sampling from a Dirichlet mixture with a combinatorial number of components. We demonstrate the STM on four real-world datasets. Compared to traditional approaches, the empirical results show that STMs give better predictive performance with simpler inferred models.

PDF Details

NeurIPS Conference 2009 Conference Paper

Reading Tea Leaves: How Humans Interpret Topic Models

Jonathan Chang
Sean Gerrish
Chong Wang
Jordan Boyd-Graber
David Blei

Probabilistic topic models are a popular tool for the unsupervised analysis of text, providing both a predictive model of future text and a latent topic representation of the corpus. Practitioners typically assume that the latent space is semantically meaningful. It is used to check models, summarize the corpus, and guide exploration of its contents. However, whether the latent space is interpretable is in need of quantitative evaluation. In this paper, we present new quantitative methods for measuring semantic meaning in inferred topics. We back these measures with large-scale user studies, showing that they capture aspects of the model that are undetected by previous measures of model quality based on held-out likelihood. Surprisingly, topic models which perform better on held-out likelihood may infer less semantically meaningful topics.

PDF Details

NeurIPS Conference 2009 Conference Paper

Variational Inference for the Nested Chinese Restaurant Process

Chong Wang
David Blei

The nested Chinese restaurant process (nCRP) is a powerful nonparametric Bayesian model for learning tree-based hierarchies from data. Since its posterior distribution is intractable, current inference methods have all relied on MCMC sampling. In this paper, we develop an alternative inference technique based on variational methods. To employ variational methods, we derive a tree-based stick-breaking construction of the nCRP mixture model, and a novel variational algorithm that efficiently explores a posterior over a large set of combinatorial structures. We demonstrate the use of this approach for text and hand written digits modeling, where we show we can adapt the nCRP to continuous data as well.

PDF Details