Arrow Research search

Author name cluster

Wei Chu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

22 papers
2 author rows

Possible papers

22

NeurIPS Conference 2025 Conference Paper

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

  • Haozhe Wang
  • Chao Qu
  • Zuming Huang
  • Wei Chu
  • Fangzhen Lin
  • Wenhu Chen

Recently, slow-thinking systems like GPT-o1 and DeepSeek-R1 have demonstrated great potential in solving challenging problems through explicit reflection. They significantly outperform the best fast-thinking models, such as GPT-4o, on various math and science benchmarks. However, their multimodal reasoning capabilities remain on par with fast-thinking models. For instance, GPT-o1's performance on benchmarks like MathVista, MathVerse, and MathVision is similar to fast-thinking models. In this paper, we aim to enhance the slow-thinking capabilities of vision-language models using reinforcement learning (without relying on distillation) to advance the state of the art. First, we adapt the GRPO algorithm with a novel technique called Selective Sample Replay (SSR) to address the vanishing advantages problem. While this approach yields strong performance, the resulting RL-trained models exhibit limited self-reflection or self-verification. To further encourage slow-thinking, we introduce Forced Rethinking, which appends a rethinking trigger token to the end of rollouts in RL training, explicitly enforcing a self-reflection reasoning step. By combining these two techniques, our model, VL-Rethinker, advances state-of-the-art scores on MathVista, MathVerse to achieve 80. 4%, 63. 5% respectively. VL-Rethinker also achieves open-source SoTA on multi-disciplinary benchmarks such as MathVision, MMMU-Pro, EMMA, and MEGA-Bench, narrowing the gap with OpenAI-o1. We conduct comprehensive ablations and analysis to provide insights into the effectiveness of our approach.

JBHI Journal 2024 Journal Article

Advancing the Boundary of Pre-Trained Models for Drug Discovery: Interpretable Fine-Tuning Empowered by Molecular Physicochemical Properties

  • Xiaoqing Lian
  • Jie Zhu
  • Tianxu Lv
  • Xiaoyan Hong
  • Longzhen Ding
  • Wei Chu
  • Jianming Ni
  • Xiang Pan

In the field of drug discovery, a proliferation of pre-trained models has surfaced, exhibiting exceptional performance across a variety of tasks. However, the extensive size of these models, coupled with the limited interpretative capabilities of current fine-tuning methods, impedes the integration of pre-trained models into the drug discovery process. This paper pushes the boundaries of pre-trained models in drug discovery by designing a novel fine-tuning paradigm known as the Head Feature Parallel Adapter (HFPA), which is highly interpretable, high-performing, and has fewer parameters than other widely used methods. Specifically, this approach enables the model to consider diverse information across representation subspaces concurrently by strategically using Adapters, which can operate directly within the model's feature space. Our tactic freezes the backbone model and forces various small-size Adapters' corresponding subspaces to focus on exploring different atomic and chemical bond knowledge, thus maintaining a small number of trainable parameters and enhancing the interpretability of the model. Moreover, we furnish a comprehensive interpretability analysis, imparting valuable insights into the chemical area. HFPA outperforms over seven physiology and toxicity tasks and achieves state-of-the-art results in three physical chemistry tasks. We also test ten additional molecular datasets, demonstrating the robustness and broad applicability of HFPA.

ICLR Conference 2024 Conference Paper

LogicMP: A Neuro-symbolic Approach for Encoding First-order Logic Constraints

  • Weidi Xu
  • Jingwei Wang
  • Lele Xie
  • Jianshan He
  • Hongting Zhou
  • Taifeng Wang
  • Xiaopei Wan
  • Jingdong Chen

Integrating first-order logic constraints (FOLCs) with neural networks is a crucial but challenging problem since it involves modeling intricate correlations to satisfy the constraints. This paper proposes a novel neural layer, LogicMP, which performs mean-field variational inference over a Markov Logic Network (MLN). It can be plugged into any off-the-shelf neural network to encode FOLCs while retaining modularity and efficiency. By exploiting the structure and symmetries in MLNs, we theoretically demonstrate that our well-designed, efficient mean-field iterations greatly mitigate the difficulty of MLN inference, reducing the inference from sequential calculation to a series of parallel tensor operations. Empirical results in three kinds of tasks over images, graphs, and text show that LogicMP outperforms advanced competitors in both performance and efficiency.

ECAI Conference 2024 Conference Paper

Robust Deep Hawkes Process Under Label Noise of Both Event and Occurrence

  • Xiaoyu Tan
  • Bin Li 0091
  • Xihe Qiu
  • Jingjing Huang
  • Yinghui Xu 0001
  • Wei Chu

Integrating deep neural networks with the Hawkes process has significantly improved predictive capabilities in finance, health informatics, and information technology. Nevertheless, these models often face challenges in real-world settings, particularly due to substantial label noise. This issue is of significant concern in the medical field, where label noise can arise from delayed updates in electronic medical records or misdiagnoses, leading to increased prediction risks. Our research indicates that deep Hawkes process models exhibit reduced robustness when dealing with label noise, particularly when it affects both event types and timing. To address these challenges, we first investigate the influence of label noise in approximated intensity functions and present a novel framework, the Robust Deep Hawkes Process (RDHP), to overcome the impact of label noise on the intensity function of Hawkes models, considering both the events and their occurrences. We tested RDHP using multiple open-source benchmarks with synthetic noise and conducted a case study on obstructive sleep apnea-hypopnea syndrome (OSAHS) in a real-world setting with inherent label noise. The results demonstrate that RDHP can effectively perform classification and regression tasks, even in the presence of noise related to events and their timing. To the best of our knowledge, this is the first study to successfully address both event and time label noise in deep Hawkes process models, offering a promising solution for medical applications, specifically in diagnosing OSAHS.

AAAI Conference 2023 Conference Paper

DC-Former: Diverse and Compact Transformer for Person Re-identification

  • Wen Li
  • Cheng Zou
  • Meng Wang
  • Furong Xu
  • Jianan Zhao
  • Ruobing Zheng
  • Yuan Cheng
  • Wei Chu

In person re-identification (ReID) task, it is still challenging to learn discriminative representation by deep learning, due to limited data. Generally speaking, the model will get better performance when increasing the amount of data. The addition of similar classes strengthens the ability of the classifier to identify similar identities, thereby improving the discrimination of representation. In this paper, we propose a Diverse and Compact Transformer (DC-Former) that can achieve a similar effect by splitting embedding space into multiple diverse and compact subspaces. Compact embedding subspace helps model learn more robust and discriminative embedding to identify similar classes. And the fusion of these diverse embeddings containing more fine-grained information can further improve the effect of ReID. Specifically, multiple class tokens are used in vision transformer to represent multiple embedding spaces. Then, a self-diverse constraint (SDC) is applied to these spaces to push them away from each other, which makes each embedding space diverse and compact. Further, a dynamic weight controller (DWC) is further designed for balancing the relative importance among them during training. The experimental results of our method are promising, which surpass previous state-of-the-art methods on several commonly used person ReID benchmarks. Our code is available at https://github.com/ant-research/Diverse-and-Compact-Transformer.

AAAI Conference 2023 Conference Paper

DRGCN: Dynamic Evolving Initial Residual for Deep Graph Convolutional Networks

  • Lei Zhang
  • Xiaodong Yan
  • Jianshan He
  • Ruopeng Li
  • Wei Chu

Graph convolutional networks (GCNs) have been proved to be very practical to handle various graph-related tasks. It has attracted considerable research interest to study deep GCNs, due to their potential superior performance compared with shallow ones. However, simply increasing network depth will, on the contrary, hurt the performance due to the over-smoothing problem. Adding residual connection is proved to be effective for learning deep convolutional neural networks (deep CNNs), it is not trivial when applied to deep GCNs. Recent works proposed an initial residual mechanism that did alleviate the over-smoothing problem in deep GCNs. However, according to our study, their algorithms are quite sensitive to different datasets. In their setting, the personalization (dynamic) and correlation (evolving) of how residual applies are ignored. To this end, we propose a novel model called Dynamic evolving initial Residual Graph Convolutional Network (DRGCN). Firstly, we use a dynamic block for each node to adaptively fetch information from the initial representation. Secondly, we use an evolving block to model the residual evolving pattern between layers. Our experimental results show that our model effectively relieves the problem of over-smoothing in deep GCNs and outperforms the state-of-the-art (SOTA) methods on various benchmark datasets. Moreover, we develop a mini-batch version of DRGCN which can be applied to large-scale data. Coupling with several fair training techniques, our model reaches new SOTA results on the large-scale ogbn-arxiv dataset of Open Graph Benchmark (OGB). Our reproducible code is available on GitHub.

AAAI Conference 2022 Conference Paper

CMUA-Watermark: A Cross-Model Universal Adversarial Watermark for Combating Deepfakes

  • Hao Huang
  • Yongtao Wang
  • Zhaoyu Chen
  • Yuze Zhang
  • Yuheng Li
  • Zhi Tang
  • Wei Chu
  • Jingdong Chen

Malicious applications of deepfakes (i. e. , technologies generating target facial attributes or entire faces from facial images) have posed a huge threat to individuals’ reputation and security. To mitigate these threats, recent studies have proposed adversarial watermarks to combat deepfake models, leading them to generate distorted outputs. Despite achieving impressive results, these adversarial watermarks have low imagelevel and model-level transferability, meaning that they can protect only one facial image from one specific deepfake model. To address these issues, we propose a novel solution that can generate a Cross-Model Universal Adversarial Watermark (CMUA-Watermark), protecting a large number of facial images from multiple deepfake models. Specifically, we begin by proposing a cross-model universal attack pipeline that attacks multiple deepfake models iteratively. Then, we design a two-level perturbation fusion strategy to alleviate the conflict between the adversarial watermarks generated by different facial images and models. Moreover, we address the key problem in cross-model optimization with a heuristic approach to automatically find the suitable attack step sizes for different models, further weakening the model-level conflict. Finally, we introduce a more reasonable and comprehensive evaluation method to fully test the proposed method and compare it with existing ones. Extensive experimental results demonstrate that the proposed CMUA- Watermark can effectively distort the fake facial images generated by multiple deepfake models while achieving a better performance than existing methods. Our code is available at https: //github. com/VDIGPKU/CMUA-Watermark.

AAAI Conference 2022 Conference Paper

Width & Depth Pruning for Vision Transformers

  • Fang Yu
  • Kun Huang
  • Meng Wang
  • Yuan Cheng
  • Wei Chu
  • Li Cui

Transformer models have demonstrated their promising potential and achieved excellent performance on a series of computer vision tasks. However, the huge computational cost of vision transformers hinders their deployment and application to edge devices. Recent works have proposed to find and remove the unimportant units of vision transformers. Despite achieving remarkable results, these methods take one dimension of network width into consideration and ignore network depth, which is another important dimension for pruning vision transformers. Therefore, we propose a Width & Depth Pruning (WDPruning) framework that reduces both width and depth dimensions simultaneously. Specifically, for width pruning, a set of learnable pruning-related parameters is used to adaptively adjust the width of transformer. For depth pruning, we introduce several shallow classifiers by using the intermediate information of the transformer blocks, which allows images to be classified by shallow classifiers instead of the deeper classifiers. In the inference period, all of the blocks after shallow classifiers can be dropped so they don’t bring additional parameters and computation. Experimental results on benchmark datasets demonstrate that the proposed method can significantly reduce the computational costs of mainstream vision transformers such as DeiT and Swin Transformer with a minor accuracy drop. In particular, on ILSVRC-12, we achieve over 22% pruning ratio of FLOPs by compressing DeiT-Base, even with an increase of 0. 14% Top-1 accuracy.

AAAI Conference 2020 System Paper

Automatic Car Damage Assessment System: Reading and Understanding Videos as Professional Insurance Inspectors

  • Wei Zhang
  • Yuan Cheng
  • Xin Guo
  • Qingpei Guo
  • Jian Wang
  • Qing Wang
  • Chen Jiang
  • Meng Wang

We demonstrate a car damage assessment system in car insurance field based on artificial intelligence techniques, which can exempt insurance inspectors from checking cars on site and help people without professional knowledge to evaluate car damages when accidents happen. Unlike existing approaches, we utilize videos instead of photos to interact with users to make the whole procedure as simple as possible. We adopt object and video detection and segmentation techniques in computer vision, and take advantage of multiple frames extracted from videos to achieve high damage recognition accuracy. The system uploads video streams captured by mobile devices, recognizes car damage on the cloud asynchronously and then returns damaged components and repair costs to users. The system evaluates car damages and returns results automatically and effectively in seconds, which reduces laboratory costs and decreases insurance claim time significantly.

AAAI Conference 2019 Conference Paper

Latent Dirichlet Allocation for Internet Price War

  • Chenchen Li
  • Xiang Yan
  • Xiaotie Deng
  • Yuan Qi
  • Wei Chu
  • Le Song
  • Junlong Qiao
  • Jianshan He

Current Internet market makers are facing an intense competitive environment, where personalized price reductions or discounted coupons are provided by their peers to attract more customers. Much investment is spent to catch up with each other’s competitors but participants in such a price cut war are often incapable of winning due to their lack of information about others’ strategies or customers’ preference. We formalize the problem as a stochastic game with imperfect and incomplete information and develop a variant of Latent Dirichlet Allocation (LDA) to infer latent variables under the current market environment, which represents preferences of customers and strategies of competitors. Tests on simulated experiments and an open dataset for real data show that, by subsuming all available market information of the market maker’s competitors, our model exhibits a significant improvement for understanding the market environment and finding the best response strategies in the Internet price war. Our work marks the first successful learning method to infer latent information in the environment of price war by the LDA modeling, and sets an example for related competitive applications to follow.

IJCAI Conference 2007 Conference Paper

  • Vikas Sindhwani
  • Wei Chu
  • Sathiya Keerthi

In this paper, we propose a graph-based construction of semi-supervised Gaussian process classifiers. Our method is based on recently proposed techniques for incorporating the geometric properties of unlabeled data within globally defined kernel functions. The full machinery for standard supervised Gaussian process inference is brought to bear on the problem of learning from labeled and unlabeled data. This approach provides a natural probabilistic extension to unseen test examples. We employ Expectation Propagation procedures for evidence-based model selection. In the presence of few labeled examples, this approach is found to significantly outperform cross-validation techniques. We present empirical results demonstrating the strengths of our approach.

NeurIPS Conference 2007 Conference Paper

Gaussian Process Models for Link Analysis and Transfer Learning

  • Kai Yu
  • Wei Chu

In this paper we develop a Gaussian process (GP) framework to model a collection of reciprocal random variables defined on the \emph{edges} of a network. We show how to construct GP priors, i. e. ,~covariance functions, on the edges of directed, undirected, and bipartite graphs. The model suggests an intimate connection between \emph{link prediction} and \emph{transfer learning}, which were traditionally considered two separate research topics. Though a straightforward GP inference has a very high complexity, we develop an efficient learning algorithm that can handle a large number of observations. The experimental results on several real-world data sets verify superior learning capacity.

NeurIPS Conference 2007 Conference Paper

Hidden Common Cause Relations in Relational Learning

  • Ricardo Silva
  • Wei Chu
  • Zoubin Ghahramani

When predicting class labels for objects within a relational database, it is often helpful to consider a model for relationships: this allows for information between class labels to be shared and to improve prediction performance. However, there are different ways by which objects can be related within a relational database. One traditional way corresponds to a Markov network structure: each existing relation is represented by an undirected edge. This encodes that, conditioned on input features, each object label is independent of other object labels given its neighbors in the graph. However, there is no reason why Markov networks should be the only representation of choice for symmetric dependence structures. Here we discuss the case when relationships are postulated to exist due to hidden com- mon causes. We discuss how the resulting graphical model differs from Markov networks, and how it describes different types of real-world relational processes. A Bayesian nonparametric classification model is built upon this graphical repre- sentation and evaluated with several empirical studies.

NeurIPS Conference 2006 Conference Paper

Relational Learning with Gaussian Processes

  • Wei Chu
  • Vikas Sindhwani
  • Zoubin Ghahramani
  • S. Keerthi

Correlation between instances is often modelled via a kernel function using in- put attributes of the instances. Relational knowledge can further reveal additional pairwise correlations between variables of interest. In this paper, we develop a class of models which incorporates both reciprocal relational information and in- put attributes using Gaussian process techniques. This approach provides a novel non-parametric Bayesian framework with a data-dependent covariance function for supervised learning tasks. We also apply this framework to semi-supervised learning. Experimental results on several real world data sets verify the usefulness of this algorithm.

NeurIPS Conference 2006 Conference Paper

Stochastic Relational Models for Discriminative Link Prediction

  • Kai Yu
  • Wei Chu
  • Shipeng Yu
  • Volker Tresp
  • Zhao Xu

We introduce a Gaussian process (GP) framework, stochastic relational models (SRM), for learning social, physical, and other relational phenomena where interactions between entities are observed. The key idea is to model the stochastic structure of entity relationships (i. e. , links) via a tensor interaction of multiple GPs, each defined on one type of entities. These models in fact define a set of nonparametric priors on infinite dimensional tensor matrices, where each element represents a relationship between a tuple of entities. By maximizing the marginalized likelihood, information is exchanged between the participating GPs through the entire relational network, so that the dependency structure of links is messaged to the dependency of entities, reflected by the adapted GP kernels. The framework offers a discriminative approach to link prediction, namely, predicting the existences, strengths, or types of relationships based on the partially observed linkage network as well as the attributes of entities (if given). We discuss properties and variants of SRM and derive an efficient learning algorithm. Very encouraging experimental results are achieved on a toy problem and a user-movie preference link prediction task. In the end we discuss extensions of SRM to general relational learning tasks.

NeurIPS Conference 2005 Conference Paper

A matching pursuit approach to sparse Gaussian process regression

  • Sathiya Keerthi
  • Wei Chu

In this paper we propose a new basis selection criterion for building sparse GP regression models that provides promising gains in accuracy as well as efficiency over previous methods. Our algorithm is much faster than that of Smola and Bartlett, while, in generalization it greatly outperforms the information gain approach proposed by Seeger et al, especially on the quality of predictive distributions.

JMLR Journal 2005 Journal Article

Gaussian Processes for Ordinal Regression

  • Wei Chu
  • Zoubin Ghahramani

We present a probabilistic kernel approach to ordinal regression based on Gaussian processes. A threshold model that generalizes the probit function is used as the likelihood function for ordinal variables. Two inference techniques, based on the Laplace approximation and the expectation propagation algorithm respectively, are derived for hyperparameter learning and model selection. We compare these two Gaussian process approaches with a previous ordinal regression method based on support vector machines on some benchmark and real-world data sets, including applications of ordinal regression to collaborative filtering and gene expression analysis. Experimental results on these data sets verify the usefulness of our approach. [abs] [ pdf ][ bib ] &copy JMLR 2005. ( edit, beta )

ICML Conference 2005 Conference Paper

Preference learning with Gaussian processes

  • Wei Chu
  • Zoubin Ghahramani

In this paper, we propose a probabilistic kernel approach to preference learning based on Gaussian processes. A new likelihood function is proposed to capture the preference relations in the Bayesian framework. The generalized formulation is also applicable to tackle many multiclass problems. The overall approach has the advantages of Bayesian methods for model selection and probabilistic prediction. Experimental results compared against the constraint classification approach on several benchmark datasets verify the usefulness of this algorithm.