Arrow Research search

Author name cluster

Ming Gu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

15 papers
1 author row

Possible papers

15

AAAI Conference 2026 Conference Paper

Towards Scalable Web Accessibility Audit with MLLMs as Copilots

  • Ming Gu
  • Ziwei Wang
  • Sicen Lai
  • Zirui Gao
  • Sheng Zhou
  • Jiajun Bu

Ensuring web accessibility is crucial for advancing social welfare, justice, and equality in digital spaces, yet the vast majority of website user interfaces remain non-compliant, due in part to the resource-intensive and unscalable nature of current auditing practices. While WCAG-EM offers a structured methodology for site-wise conformance evaluation, it involves great human efforts and lacks practical support for execution at scale. In this work, we present an auditing framework, AAA, which operationalizes WCAG-EM through a human-AI partnership model. AAA is anchored by two key innovations: GRASP, a graph-based multimodal sampling method that ensures representative page coverage via learned embeddings of visual, textual, and relational cues; and MaC, a multimodal large language model-based copilot strategy that supports auditors through cross-modal reasoning and intelligent assistance in high-effort tasks. Together, these components enable scalable, end-to-end web accessibility auditing, empowering human auditors with AI-enhanced assistance for real-world impact. We further contribute four novel datasets designed for benchmarking core stages of the audit pipeline. Extensive experiments demonstrate the effectiveness of our methods, providing insights that small-scale language models can serve as capable experts when fine-tuned.

AAAI Conference 2025 Conference Paper

FatesGS: Fast and Accurate Sparse-View Surface Reconstruction Using Gaussian Splatting with Depth-Feature Consistency

  • Han Huang
  • Yulun Wu
  • Chao Deng
  • Ge Gao
  • Ming Gu
  • Yu-Shen Liu

Recently, Gaussian Splatting has sparked a new trend in the field of computer vision. Apart from novel view synthesis, it has also been extended to the area of multi-view reconstruction. The latest methods facilitate complete, detailed surface reconstruction while ensuring fast training speed. However, these methods still require dense input views, and their output quality significantly degrades with sparse views. We observed that the Gaussian primitives tend to overfit the few training views, leading to noisy floaters and incomplete reconstruction surfaces. In this paper, we present an innovative sparse-view reconstruction framework that leverages intra-view depth and multi-view feature consistency to achieve remarkably accurate surface reconstruction. Specifically, we utilize monocular depth ranking information to supervise the consistency of depth distribution within patches and employ a smoothness loss to enhance the continuity of the distribution. To achieve finer surface reconstruction, we optimize the absolute position of depth through multi-view projection features. Extensive experiments on DTU and BlendedMVS demonstrate that our method outperforms state-of-the-art methods with a speedup of 60x to 200x, achieving swift and fine-grained mesh reconstruction without the need for costly pre-training.

NeurIPS Conference 2025 Conference Paper

Making Classic GNNs Strong Baselines Across Varying Homophily: A Smoothness–Generalization Perspective

  • Ming Gu
  • Zhuonan Zheng
  • Sheng Zhou
  • Meihan Liu
  • Jiawei Chen
  • Qiaoyu Tan
  • Liangcheng Li
  • Jiajun Bu

Graph Neural Networks (GNNs) have achieved great success but are often considered to be challenged by varying levels of homophily in graphs. Recent empirical studies have surprisingly shown that homophilic GNNs can perform well across datasets of different homophily levels with proper hyperparameter tuning, but the underlying theory and effective architectures remain unclear. To advance GNN universality across varying homophily, we theoretically revisit GNN message passing and uncover a novel \textit{smoothness-generalization dilemma}, where increasing hops inevitably enhances smoothness at the cost of generalization. This dilemma hinders learning in high-order homophilic neighborhoods and all heterophilic ones, where generalization is critical due to complex neighborhood class distributions that are sensitive to shifts induced by noise or sparsity. To address this, we introduce the Inceptive Graph Neural Network (IGNN) built on three simple yet effective design principles, which alleviate the dilemma by enabling distinct hop-wise generalization alongside improved overall generalization with adaptive smoothness. Benchmarking against 30 baselines demonstrates IGNN's superiority and reveals notable universality in certain homophilic GNN variants. Our code and datasets are available at \href{https: //github. com/galogm/IGNN}{https: //github. com/galogm/IGNN}.

AAAI Conference 2025 Conference Paper

Sparis: Neural Implicit Surface Reconstruction of Indoor Scenes from Sparse Views

  • Yulun Wu
  • Han Huang
  • Wenyuan Zhang
  • Chao Deng
  • Ge Gao
  • Ming Gu
  • Yu-Shen Liu

In recent years, reconstructing indoor scene geometry from multi-view images has achieved encouraging accomplishments. Current methods incorporate monocular priors into neural implicit surface models to achieve high-quality reconstructions. However, these methods require hundreds of images for scene reconstruction. When only a limited number of views are available as input, the performance of monocular priors deteriorates due to scale ambiguity, leading to the collapse of the reconstructed scene geometry. In this paper, we propose a new method, named Sparis, for indoor surface reconstruction from sparse views. Specifically, we investigate the impact of monocular priors on sparse scene reconstruction, introducing a novel prior based on inter-image matching information. Our prior offers more accurate depth information while ensuring cross-view matching consistency. Additionally, we employ an angular filter strategy and an epipolar matching weight function, aiming to reduce errors due to view matching inaccuracies, thereby refining the inter-image prior for improved reconstruction accuracy. The experiments conducted on widely used benchmarks demonstrate superior performance in sparse-view scene reconstruction.

NeurIPS Conference 2025 Conference Paper

Understanding and Enhancing Message Passing on Heterophilic Graphs via Compatibility Matrix

  • Zhuonan Zheng
  • Yuanchen Bei
  • Zhiyao Zhou
  • Sheng Zhou
  • Yao Ma
  • Ming Gu
  • HONGJIA XU
  • Jiawei Chen

Graph Neural Networks (GNNs) excel in graph mining tasks thanks to their message-passing mechanism, which aligns with the homophily assumption. However, connected nodes can also exhibit inconsistent behaviors, termed heterophilic patterns, sparking interest in heterophilic GNNs (HTGNNs). Although the message-passing mechanism seems unsuitable for heterophilic graphs owing to the propagation of dissimilar messages, it is still popular in HTGNNs and consistently achieves notable success. Some efforts have investigated such an interesting phenomenon, but are limited in the data perspective. The model-perspective understanding remains largely unexplored, which is conducive to guiding the designs of HTGNNs. To fill this gap, we build the connection between node discriminability and the compatibility matrix (CM). We reveal that the effectiveness of the message passing in HTGNNs may be credited to increasing the proposed Compatibility Matrix Discriminability (CMD). However, the issues of sparsity and noise pose great challenges to leveraging CM. Thus, we propose CMGNN, a novel approach to alleviate these issues while enhancing the CM and node embeddings explicitly. A thorough evaluation involving 13 datasets and comparison against 20 well-established baselines highlights the superiority of CMGNN.

AAAI Conference 2024 Conference Paper

GridFormer: Point-Grid Transformer for Surface Reconstruction

  • Shengtao Li
  • Ge Gao
  • Yudong Liu
  • Yu-Shen Liu
  • Ming Gu

Implicit neural networks have emerged as a crucial technology in 3D surface reconstruction. To reconstruct continuous surfaces from discrete point clouds, encoding the input points into regular grid features (plane or volume) has been commonly employed in existing approaches. However, these methods typically use the grid as an index for uniformly scattering point features. Compared with the irregular point features, the regular grid features may sacrifice some reconstruction details but improve efficiency. To take full advantage of these two types of features, we introduce a novel and high-efficiency attention mechanism between the grid and point features named Point-Grid Transformer (GridFormer). This mechanism treats the grid as a transfer point connecting the space and point cloud. Our method maximizes the spatial expressiveness of grid features and maintains computational efficiency. Furthermore, optimizing predictions over the entire space could potentially result in blurred boundaries. To address this issue, we further propose a boundary optimization strategy incorporating margin binary cross-entropy loss and boundary sampling. This approach enables us to achieve a more precise representation of the object structure. Our experiments validate that our method is effective and outperforms the state-of-the-art approaches under widely used benchmarks by producing more precise geometry reconstructions. The code is available at https://github.com/list17/GridFormer.

AAAI Conference 2024 Conference Paper

NeuSurf: On-Surface Priors for Neural Surface Reconstruction from Sparse Input Views

  • Han Huang
  • Yulun Wu
  • Junsheng Zhou
  • Ge Gao
  • Ming Gu
  • Yu-Shen Liu

Recently, neural implicit functions have demonstrated remarkable results in the field of multi-view reconstruction. However, most existing methods are tailored for dense views and exhibit unsatisfactory performance when dealing with sparse views. Several latest methods have been proposed for generalizing implicit reconstruction to address the sparse view reconstruction task, but they still suffer from high training costs and are merely valid under carefully selected perspectives. In this paper, we propose a novel sparse view reconstruction framework that leverages on-surface priors to achieve highly faithful surface reconstruction. Specifically, we design several constraints on global geometry alignment and local geometry refinement for jointly optimizing coarse shapes and fine details. To achieve this, we train a neural network to learn a global implicit field from the on-surface points obtained from SfM and then leverage it as a coarse geometric constraint. To exploit local geometric consistency, we project on-surface points onto seen and unseen views, treating the consistent loss of projected features as a fine geometric constraint. The experimental results with DTU and BlendedMVS datasets in two prevalent sparse settings demonstrate significant improvements over the state-of-the-art methods.

AAAI Conference 2024 Conference Paper

Rethinking Propagation for Unsupervised Graph Domain Adaptation

  • Meihan Liu
  • Zeyu Fang
  • Zhen Zhang
  • Ming Gu
  • Sheng Zhou
  • Xin Wang
  • Jiajun Bu

Unsupervised Graph Domain Adaptation (UGDA) aims to transfer knowledge from a labelled source graph to an unlabelled target graph in order to address the distribution shifts between graph domains. Previous works have primarily focused on aligning data from the source and target graph in the representation space learned by graph neural networks (GNNs). However, the inherent generalization capability of GNNs has been largely overlooked. Motivated by our empirical analysis, we reevaluate the role of GNNs in graph domain adaptation and uncover the pivotal role of the propagation process in GNNs for adapting to different graph domains. We provide a comprehensive theoretical analysis of UGDA and derive a generalization bound for multi-layer GNNs. By formulating GNN Lipschitz for k-layer GNNs, we show that the target risk bound can be tighter by removing propagation layers in source graph and stacking multiple propagation layers in target graph. Based on the empirical and theoretical analysis mentioned above, we propose a simple yet effective approach called A2GNN for graph domain adaptation. Through extensive experiments on real-world datasets, we demonstrate the effectiveness of our proposed A2GNN framework.

NeurIPS Conference 2022 Conference Paper

A2: Efficient Automated Attacker for Boosting Adversarial Training

  • Zhuoer Xu
  • Guanghui Zhu
  • Changhua Meng
  • Shiwen Cui
  • Zhenzhe Ying
  • Weiqiang Wang
  • Ming Gu
  • Yihua Huang

Based on the significant improvement of model robustness by AT (Adversarial Training), various variants have been proposed to further boost the performance. Well-recognized methods have focused on different components of AT (e. g. , designing loss functions and leveraging additional unlabeled data). It is generally accepted that stronger perturbations yield more robust models. However, how to generate stronger perturbations efficiently is still missed. In this paper, we propose an efficient automated attacker called A2 to boost AT by generating the optimal perturbations on-the-fly during training. A2 is a parameterized automated attacker to search in the attacker space for the best attacker against the defense model and examples. Extensive experiments across different datasets demonstrate that A2 generates stronger perturbations with low extra cost and reliably improves the robustness of various AT methods against different attacks.

NeurIPS Conference 2022 Conference Paper

Moderate-fitting as a Natural Backdoor Defender for Pre-trained Language Models

  • Biru Zhu
  • Yujia Qin
  • Ganqu Cui
  • Yangyi Chen
  • Weilin Zhao
  • Chong Fu
  • Yangdong Deng
  • Zhiyuan Liu

Despite the great success of pre-trained language models (PLMs) in a large set of natural language processing (NLP) tasks, there has been a growing concern about their security in real-world applications. Backdoor attack, which poisons a small number of training samples by inserting backdoor triggers, is a typical threat to security. Trained on the poisoned dataset, a victim model would perform normally on benign samples but predict the attacker-chosen label on samples containing pre-defined triggers. The vulnerability of PLMs under backdoor attacks has been proved with increasing evidence in the literature. In this paper, we present several simple yet effective training strategies that could effectively defend against such attacks. To the best of our knowledge, this is the first work to explore the possibility of backdoor-free adaptation for PLMs. Our motivation is based on the observation that, when trained on the poisoned dataset, the PLM's adaptation follows a strict order of two stages: (1) a moderate-fitting stage, where the model mainly learns the major features corresponding to the original task instead of subsidiary features of backdoor triggers, and (2) an overfitting stage, where both features are learned adequately. Therefore, if we could properly restrict the PLM's adaptation to the moderate-fitting stage, the model would neglect the backdoor triggers but still achieve satisfying performance on the original task. To this end, we design three methods to defend against backdoor attacks by reducing the model capacity, training epochs, and learning rate, respectively. Experimental results demonstrate the effectiveness of our methods in defending against several representative NLP backdoor attacks. We also perform visualization-based analysis to attain a deeper understanding of how the model learns different features, and explore the effect of the poisoning ratio. Finally, we explore whether our methods could defend against backdoor attacks for the pre-trained CV model. The codes are publicly available at https: //github. com/thunlp/Moderate-fitting.

NeurIPS Conference 2019 Conference Paper

Fast Low-rank Metric Learning for Large-scale and High-dimensional Data

  • Han Liu
  • Zhizhong Han
  • Yu-Shen Liu
  • Ming Gu

Low-rank metric learning aims to learn better discrimination of data subject to low-rank constraints. It keeps the intrinsic low-rank structure of datasets and reduces the time cost and memory usage in metric learning. However, it is still a challenge for current methods to handle datasets with both high dimensions and large numbers of samples. To address this issue, we present a novel fast low-rank metric learning (FLRML) method. FLRML casts the low-rank metric learning problem into an unconstrained optimization on the Stiefel manifold, which can be efficiently solved by searching along the descent curves of the manifold. FLRML significantly reduces the complexity and memory usage in optimization, which makes the method scalable to both high dimensions and large numbers of samples. Furthermore, we introduce a mini-batch version of FLRML to make the method scalable to larger datasets which are hard to be loaded and decomposed in limited memory. The outperforming experimental results show that our method is with high accuracy and much faster than the state-of-the-art methods under several benchmarks with large numbers of high-dimensional data. Code has been made available at https: //github. com/highan911/FLRML.

AAAI Conference 2018 Conference Paper

Energy-Efficient Automatic Train Driving by Learning Driving Patterns

  • Jin Huang
  • Yue Gao
  • Sha Lu
  • Xibin Zhao
  • Yangdong Deng
  • Ming Gu

Railway is regarded as the most sustainable means of modern transportation. With the fast-growing of fleet size and the railway mileage, the energy consumption of trains is becoming a serious concern globally. The nature of railway offers a unique opportunity to optimize the energy efficiency of locomotives by taking advantage of the undulating terrains along a route. The derivation of an energy-optimal train driving solution, however, proves to be a significant challenge due to the high dimension, nonlinearity, complex constraints, and timevarying characteristic of the problem. An optimized solution can only be attained by considering both the complex environmental conditions of a given route and the inherent characteristics of a locomotive. To tackle the problem, this paper employs a high-order correlation learning method for online generation of the energy optimized train driving solutions. Based on the driving data of experienced human drivers, a hypergraph model is used to learn the optimal embedding from the specified features for the decision of a driving operation. First, we design a feature set capturing the driving status. Next all the training data are formulated as a hypergraph and an inductive learning process is conducted to obtain the embedding matrix. The hypergraph model can be used for real-time generation of driving operation. We also proposed a reinforcement updating scheme, which offers the capability of sustainable enhancement on the hypergraph model in industrial applications. The learned model can be used to determine an optimized driving operation in real-time tested on the Hardware-in-Loop platform. Validation experiments proved that the energy consumption of the proposed solution is around 10% lower than that of average human drivers.

IJCAI Conference 2017 Conference Paper

Vertex-Weighted Hypergraph Learning for Multi-View Object Classification

  • Lifan Su
  • Yue Gao
  • Xibin Zhao
  • Hai Wan
  • Ming Gu
  • Jiaguang Sun

3D object classification with multi-view representation has become very popular, thanks to the progress on computer techniques and graphic hardware, and attracted much research attention in recent years. Regarding this task, there are mainly two challenging issues, i. e. , the complex correlation among multiple views and the possible imbalance data issue. In this work, we propose to employ the hypergraph structure to formulate the relationship among 3D objects, taking the advantage of hypergraph on high-order correlation modelling. However, traditional hypergraph learning method may suffer from the imbalance data issue. To this end, we propose a vertex-weighted hypergraph learning algorithm for multi-view 3D object classification, introducing an updated hypergraph structure. In our method, the correlation among different objects is formulated in a hypergraph structure and each object (vertex) is associated with a corresponding weight, weighting the importance of each sample in the learning process. The learning process is conducted on the vertex-weighted hypergraph and the estimated object relevance is employed for object classification. The proposed method has been evaluated on two public benchmarks, i. e. , the NTU and the PSB datasets. Experimental results and comparison with the state-of-the-art methods and recent deep learning method demonstrate the effectiveness of our proposed method.

NeurIPS Conference 2001 Conference Paper

Spectral Relaxation for K-means Clustering

  • Hongyuan Zha
  • Xiaofeng He
  • Chris Ding
  • Ming Gu
  • Horst Simon

The popular K-means clustering partitions a data set by minimiz(cid: 173) ing a sum-of-squares cost function. A coordinate descend method is then used to find local minima. In this paper we show that the minimization can be reformulated as a trace maximization problem associated with the Gram matrix of the data vectors. Furthermore, we show that a relaxed version of the trace maximization problem possesses global optimal solutions which can be obtained by com(cid: 173) puting a partial eigendecomposition of the Gram matrix, and the cluster assignment for each data vectors can be found by comput(cid: 173) ing a pivoted QR decomposition of the eigenvector matrix. As a by-product we also derive a lower bound for the minimum of the sum-of-squares cost function.