Arrow Research search

Author name cluster

Ming Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

119 papers
2 author rows

Possible papers

119

EAAI Journal 2026 Journal Article

A semantic segmentation model for early-stage fire detection from aerial remote sensing

  • Zhe Liu
  • Yu Sun
  • Xiangyuan Jiang
  • Pei Duan
  • Ming Li

For forest fire disasters threatening to the ecological environment and human life safety, current research focuses solely on detecting either flame or smoke. This often leads to missed detection or false detection. In this paper, we propose a semantic segmentation model that aims to accurately segment flame and smoke simultaneously. A Compact Atrous Spatial Pyramid Pooling module is developed with the objective of capturing multi-scale contextual information efficiently, addressing the significant scale disparities between flame and smoke. Additionally, a Bottom-up Detail-informed Feature Fusion Module is proposed, which leverages shallow features to guide cross-layer feature fusion, thereby enhancing the detection accuracy of small targets. Lastly, a Foreground Emphasis Module is proposed to mitigate the issue of foreground sparsity that commonly exists in remote sensing images of early forest fires. This module utilizes foreground classification results to guide segmentation, making the model focus more on the identification of foreground. Experimental results suggest that our method markedly surpasses other methods in early-stage fire scenarios and achieves accurate disaster area segmentation in various scenarios such as urban fires. In addition, a processing speed of 41. 83 frames per second is attainable on TITAN Xp devices, which fully demonstrates its excellent segmentation performance and efficient real-time processing capability.

AAAI Conference 2026 Conference Paper

Amplifying Discrepancies: Exploiting Macro and Micro Inconsistencies for Image Manipulation Localization

  • Shenghao Chen
  • Yibo Zhao
  • Tianyi Wang
  • Chunjie Ma
  • Weili Guan
  • Ming Li
  • Zan Gao

The rapid development of image manipulation technologies poses significant challenges to multimedia forensics, especially in accurate localization of manipulated regions. Existing methods often fail to fully explore the intrinsic discrepancies between manipulated and authentic regions, resulting in sub-optimal performance. To address this limitation, we propose the Focus Region Discrepancy Network (FRD-Net), a novel and efficient framework that significantly enhances manipulation localization by amplifying discrepancies at both macro- and micro-levels. Specifically, our proposed Iterative Clustering Module (ICM) groups features into two discriminative clusters and refines representations via backward propagation from cluster centers, improving the distinction between tampered and authentic regions at the macro level. Thereafter, our Differential Progressive Module (DPM) is constructed to capture fine-grained structural inconsistencies within local neighborhoods and integrate them into a Central Difference Convolution, increasing sensitivity to subtle manipulation details at the micro level. Finally, these complementary modules are seamlessly integrated into a compact architecture that achieves a favorable balance between accuracy and efficiency. Extensive experiments on multiple benchmarks demonstrate that FRD-Net consistently surpasses state-of-the-art methods in terms of manipulation localization performance while maintaining a lower computational cost.

EAAI Journal 2026 Journal Article

An intelligent approach for reconstructing feature-based models from boundary representations via organic adaptation of two-level networks

  • Wanbin Pan
  • Hongdu Zhu
  • Weijuan Cao
  • Shuming Gao
  • Ming Li

Reconstructing editable feature-based computer-aided design (CAD) models from boundary representation (B-Rep) data is valuable for accelerating design iteration and improving model reuse, but it is hindered by (1) feature identification, especially under feature interactions, and (2) non-unique feature sequence determination. This paper proposes a two-level learning pipeline that combines a multi-task classification network and a link prediction network. The multi-task network jointly predicts per-face feature types and per-edge precedence relations, while the link prediction network infers missing connections to merge fragmented regions into complete feature instances. On the Fusion 360 Segmentation dataset (35, 680 user-created models), 85, 511 feature instances are manually annotated for training and evaluation. The proposed classifier achieves face/edge accuracies of 94. 51%/87. 21% and intersection-over-union (IoU) scores of 78. 13%/56. 41% (all reported as means), improving face IoU by 4. 52 and 1. 03 percentage points over UV-Net (a typical B-Rep learning baseline) and BRepNet (a typical topological message-passing baseline for B-Rep models), respectively. The link predictor reaches an accuracy of 83. 87% with an IoU of 72. 22% (means), enabling feature instance identification with a mean instance identification accuracy (mInsIdenAcc) of 74. 11% and a mean instance identification IoU (mInsIdenIoU) of 87. 83%. Based on the resulting feature precedence graph, valid feature sequences are automatically generated and can be replayed to reconstruct interpretable and editable feature histories for downstream CAD applications.

AAAI Conference 2026 Conference Paper

AnomalyPainter: Vision-Language-Diffusion Synergy for Realistic and Diverse Unseen Industrial Anomaly Synthesis

  • Zhangyu Lai
  • Yilin Lu
  • Xinyang Li
  • Jianghang Lin
  • Yansong Qu
  • Ming Li
  • Liujuan Cao

Visual anomaly detection is limited by the lack of sufficient anomaly data. While existing anomaly synthesis methods have made remarkable progress, achieving both realism and diversity in synthesis remains a major obstacle. To address this, we propose AnomalyPainter, a novel framework that breaks the diversity-realism trade-off dilemma through synergizing Vision Language Large Model (VLLM), Latent Diffusion Model (LDM), and our newly introduced texture library Tex-9K. Tex-9K is a professional texture library containing 75 categories and 8792 texture assets crafted for diverse anomaly synthesis. Leveraging VLLM's general knowledge, reasonable anomaly text descriptions are generated for each industrial object and matched with relevant diverse textures from Tex-9K. These textures then guide the LDM via ControlNet to paint on normal images. Furthermore, we introduce Texture-Aware Latent Init to stabilize the natural-image-trained ControlNet for industrial images. Extensive experiments show that AnomalyPainter outperforms existing methods in realism, diversity, and generalization, achieving superior downstream performance.

AAAI Conference 2026 Conference Paper

ARBench: Algorithmic Reasoner or API Alchemist? Evaluating LLMs Beyond API Calls

  • Ren-Biao Liu
  • Chao-Zeng Ma
  • Anqi Li
  • Hui Sun
  • Xin-Ye Li
  • Ming Li

Large Language Models (LLMs) have demonstrated impressive capabilities in code generation. Like human programmers, LLMs tend to call high-level APIs and libraries to program efficiently. However, this shortcut may hinder LLMs from learning the essential algorithm reasoning, leading instead to rote memorization of API usage. As a result, LLMs often struggle to generalize to new or domain-specific algorithms that lack ready-made library support. In this work, we propose ARBench, a novel benchmark for evaluating LLMs’ ability to generate machine learning algorithms from scratch, beyond merely invoking high-level APIs. It emphasizes algorithmic reasoning and implementation, distinguishing genuine understanding from superficial API usage. It covers fundamental and advanced machine learning tasks, rigorously assessing current LLMs’ capacity to implement these algorithms from scratch. Our evaluation reveals the strengths and weaknesses of state-of-the-art LLMs in algorithmic reasoning and generalization, offering valuable insights to guide future research and development.

AAAI Conference 2026 Conference Paper

CP-FREEZER: Latency Attacks Against Vehicular Cooperative Perception

  • Chenyi Wang
  • Ruoyu Song
  • Raymond Muller
  • Jean-Philippe Monteuuis
  • Z. Berkay Celik
  • Jonathan Petit
  • Ryan Gerdes
  • Ming Li

Cooperative perception (CP) enhances situational awareness of connected and autonomous vehicles by exchanging and combining messages from multiple agents. While prior work has explored adversarial integrity attacks that degrade detection accuracy, little is known about CP's robustness against attacks on timeliness (or availability), a safety-critical requirement for autonomous driving. In this paper, we present CP-FREEZER, the first latency attack that maximizes the computation delay of CP algorithms by injecting adversarial perturbation via V2V messages. Our attack resolves several unique challenges, including the non-differentiability of point cloud preprocessing, asynchronous knowledge of the victim’s input due to transmission delays, and uses a novel loss function that effectively maximizes the execution time of the CP pipeline. Extensive experiments show that CP-FREEZER increases end-to-end CP latency by over 90×, pushing per-frame processing time beyond 3 seconds with a 100% success rate on our real-world vehicle testbed. Our findings reveal a critical threat to the availability of CP systems, highlighting the urgent need for robust defenses.

AAAI Conference 2026 Conference Paper

CyC3D: Fine-grained Controllable 3D Generation via Cycle Consistency Regularization

  • Hongbin Xu
  • Chaohui Yu
  • Feng Xiao
  • Jiazheng Xing
  • Hai Ci
  • Weitao Chen
  • Fan Wang
  • Ming Li

Despite the remarkable progress of 3D generation, achieving controllability, i.e., ensuring consistency between generated 3D content and input conditions like edge and depth, remains a significant challenge. Existing methods often struggle to maintain accurate alignment, leading to noticeable discrepancies. To address this issue, we propose CyC3D, a new framework that enhances controllable 3D generation by explicitly encouraging cyclic consistency between the second-order 3D content, generated based on extracted signals from the first-order generation, and its original input controls. Specifically, we employ an efficient feed-forward backbone that can generate a 3D object from an input condition and a text prompt. Given an initial viewpoint and a control signal, a novel view is rendered from the generated 3D content, from which the extracted condition is used to regenerate the 3D content. This re-generated output is then rendered back to the initial viewpoint, followed by another round of control signal extraction, forming a cyclic process with two consistency constraints. View consistency ensures coherence between the two generated 3D objects, measured by semantic similarity to accommodate generative diversity. Condition consistency aligns the final extracted signal with the original input control, preserving structural or geometric details throughout the process. Extensive experiments on popular benchmarks demonstrate that CyC3D significantly improves controllability, especially for fine-grained details, outperforming existing methods across various conditions (e.g., +14.17% PSNR for edge, +6.26% PSNR for sketch).

AAAI Conference 2026 Conference Paper

Dynamic-Static Synergistic Selection Method for Candidate Code Solutions with Generated Test Cases

  • Ren-Biao Liu
  • Jiang-Tian Xue
  • Chao-Zeng Ma
  • Hui Sun
  • Xin-Ye Li
  • Ming Li

Large language models (LLMs) show significant improvement in code generation. A common practice is sampling multiple candidate codes to increase the likelihood of producing an accurate solution. However, effectively identifying the best candidate from the pool is a significant challenge. Although existing code consensus methods attempt to solve this issue, they suffer from a critical problem: relying on test cases generated by LLMs, which can be flawed or provide incomplete coverage. This problem can result in erroneous validations, causing correct code to fail flawed tests and preventing the detection of functional differences in candidate code solutions. To address these issues, we present the Dynamic-Static Synergistic Selection Method, a novel framework that combines two complementary analytical approaches. First, it uses the abstract syntax tree (AST) to detect and filter candidate solutions and test cases. Second, the method statically analyzes the quality of the solutions and then dynamically validates functional consistency based on the execution results of the extracted inputs, thereby neutralizing the impact of faulty tests. Extensive experiments demonstrate that this synergistic approach significantly outperforms existing methods, substantially enhancing the correctness of the selected code.

EAAI Journal 2026 Journal Article

Green smart home design based on quality function deployment integrating customer requirements and life-cycle sustainability in Z-numbers environment

  • Yue Xiao
  • Ming Li
  • Yuejia Li
  • Yingcheng Xu
  • Hongde Liu

Smart appliance design is transitioning from purely customer-centric models to integrated approaches that balance customer requirements (CRs) with full-life cycle sustainability. However, current methodologies are limited by incomplete customer requirement extraction, low reliability in Quality Function Deployment (QFD) evaluations, and disconnected customer-environment optimization. To address these gaps, we propose an integrated scheme combining QFD and Multi-Objective Linear Programming based on online reviews and fuzzy Z-numbers. Specifically, we first extract CRs from online reviews and calculate their importance via an integration of subjective and objective methodologies. This approach not only addresses the subjectivity inherent in expert evaluations but also mitigates the lack of long-term feedback typically found in online reviews. Secondly, by constructing the QFD model integrating fuzzy Z-number-based Best-Worst Method and fuzzy-weighted MULTIMOORA, the House of Quality is employed to translate CRs into the priority ranking of engineering characteristics. Finally, a Real Number and Fuzzy Z-number Based Multi-objective Optimization Model is constructed with the objectives of CRs satisfaction and life-cycle environmental sustainability. By integrating multiple membership functions, the optimal set of design solutions is derived through model solving. A case study on smart refrigerators demonstrates the framework's effectiveness, identifying “Energy Efficiency Class” and “Overall Power Consumption” as the most critical ECs and deriving optimal design configurations that balance user satisfaction with lifecycle sustainability.

JBHI Journal 2026 Journal Article

HDPL: Hypergraph-based Dynamic Prompting Learning for Incomplete Multimodal Medical Learning

  • Xiaomin Zhou
  • Guoheng Huang
  • Qin Zhao
  • Jianbin He
  • Xiaochen Yuan
  • Ming Li
  • Chi-Man Pun
  • Ling Guo

Multimodal learning has garnered significant attention in the medical field due to its ability to provide a more comprehensive perspective utilizing various types of data, that aids in making more accurate decisions. However, the complexity of medical data, coupled with missing modalities, severely hinders predictive accuracy. Existing methods for multimodal learning with missing modalities still face considerable challenges. For instance, approaches that construct multimodal shared feature spaces often result in high computational costs, while methods that infer missing modalities based on complete ones may overly rely on the complete modalities, potentially skewing results. Pre-trained transformer methods address these issues but still have limitations, such as it can only process one missing modality at testing-stage. This is partly because structured data, unlike sequential data, lacks inherent minimum semantic units or natural order. Additionally, the positional encodings generated by this type of methods may introduce information interference when applied to structured data, leading to poor alignment with sequential data during modality fusion in transformer models. To tackle these challenges, we introduce HDPL: Hypergraph-based Dynamic Prompt Learning for Incomplete Multimodal Medical Learning, comprising three modules. The High-Order Hypergraph Embedding module can identify the minimal semantic units within structured data and utilizes hypergraph structures to extract high-dimensional features from clinical data. The Multimodal Medical Data Integrator module closes the distance of the embedding vectors corresponding in the shared space of modality-features, facilitating the integration of modalities in transformer. The Dynamic Network Structure Optimization module is a dynamic learning network by dynamically change the width and depth of network, improving the overall performance of the model, and it alleviates the shortcomings caused by incomplete modality to some extent. Through comprehensive experimentation, we demonstrate the efficiency and robustness of our model in dealing missing modalities and reducing training-burdens. Our code and dataset are available at https://github.com/colorful823/HDPL.

AAAI Conference 2026 Conference Paper

Heterophily-aware Contrastive Learning for Heterophilic Hypergraphs

  • Ming Li
  • Yongqi Li
  • Yuting Chen
  • Feilong Cao
  • Ke Lv

Hypergraph neural networks (HNNs) have emerged as powerful tools for modeling high-order relationships in complex systems. However, most existing HNNs are designed under the assumption of homophily, which does not hold in many real-world scenarios where connected nodes often exhibit diverse semantics, i.e., heterophily. This inconsistency leads to suboptimal aggregation and degraded performance, especially in low-label regimes. While a few recent methods have attempted to enhance heterophilic hypergraph learning, they often rely heavily on label supervision and overlook the potential of self-supervised techniques. In this paper, we propose HeroCL, a heterophily-aware contrastive learning framework that improves hypergraph representation under both structural heterogeneity and label scarcity. Specifically, HeroCL integrates a multi-hop neighbor encoding module to capture informative higher-order context and incorporates two complementary contrastive objectives, label-aware and structure-aware, to guide representation learning from both semantic and relational perspectives. A multi-granularity contrastive strategy is introduced to exploit latent signals across multiple neighborhood levels. Extensive experiments on several benchmark datasets against 11 existing baselines demonstrate that HeroCL achieves consistent and significant performance gains, particularly under strong heterophily and limited supervision, validating its robustness and effectiveness.

AAAI Conference 2026 Conference Paper

High-Pass Matters: Theoretical Insights and Sheaflet-Based Design for Hypergraph Neural Networks

  • Ming Li
  • Yujie Fang
  • Dongrui Shen
  • Han Feng
  • Xiaosheng Zhuang
  • Kelin Xia
  • Pietro Lio

Hypergraph neural networks (HGNNs) have shown great potential in modeling higher-order relationships among multiple entities. However, most existing HGNNs primarily emphasize low-pass filtering while neglecting the role of high-frequency information. In this work, we present a theoretical investigation into the spectral behavior of HGNNs and prove that combining both low-pass and high-pass components leads to more expressive and effective models. Notably, our analysis highlights that high-pass signals play a crucial role in capturing local discriminative structures within hypergraphs. Guided by these insights, we propose a novel sheaflet-based HNNs that integrates cellular sheaf theory and framelet transforms to preserve higher-order dependencies while enabling multi-scale spectral decomposition. This framework explicitly emphasizes high-pass components, aligning with our theoretical findings. Extensive experiments on benchmark datasets demonstrate the superiority of our approach over existing methods, validating the importance of high-frequency information in hypergraph learning.

AAAI Conference 2026 Conference Paper

HyperAim: Hypergraph Contrastive Learning with Adaptive Multi-frequency Filters

  • Ming Li
  • Ruiting Zhao
  • Zihao Yan
  • Lu Bai
  • Lixin Cui
  • Feilong Cao

Unsupervised hypergraph representation learning has recently gained traction for its ability to model complex high-order interactions without requiring labeled data. However, existing contrastive learning methods typically overlook the frequency diversity inherent in hypergraph signals. To address this issue, we propose HyperAim, a contrastive learning framework that integrates adaptive multi-frequency filtering through both decoupled and coupled designs. Specifically, HyperAim employs two decoupled channels with polynomial low-pass and high-pass filters to separately capture distinct frequency components, and a third channel based on framelet decomposition that adaptively fuses multi-frequency signals in a coupled manner. A frequency-aware contrastive learning strategy is introduced to align representations across views using a combination of InfoNCE loss and pseudo-label-guided supervision. Extensive experiments across 12 benchmark datasets, covering both homophilic and heterophilic hypergraphs, demonstrate the consistent superiority of HyperAim over 17 baselines. Ablation studies further confirm the benefits of explicitly modeling and aligning frequency-specific representations.

AAAI Conference 2026 Conference Paper

HyperGOOD: Towards Out-of-Distribution Detection in Hypergraphs

  • Tingyi Cai
  • Yunliang Jiang
  • Ming Li
  • Changqin Huang
  • Yujie Fang
  • Chengling Gao
  • Zhonglong Zheng

Out-of-distribution (OOD) detection plays a critical role in ensuring the robustness of machine learning models in open-world settings. While extensive efforts have been made in vision, language, and graph domains, the challenge of OOD detection in hypergraph-structured data remains unexplored. In this work, we formalize the problem of hypergraph out-of-distribution (HOOD) detection, which aims to identify nodes or hyperedges whose high-order relational contexts differ significantly from those seen during training. We propose HyperGOOD, a unified energy-based detection framework that integrates multi-scale spectral decomposition with structure-aware uncertainty propagation. By preserving both low- and high-frequency signals and diffusing uncertainty across the hypergraph, HyperGOOD effectively captures subtle and relationally entangled anomalies. Experimental results on nine hypergraph datasets demonstrate the effectiveness of our approach, establishing a new foundation for robust hypergraph learning under distributional shifts.

AAAI Conference 2026 Conference Paper

HyperNoRA: Hyperedge Prediction via Node-Level Relation-Aware Self-Supervised Hypergraph Learning

  • Ming Li
  • Zhanle Zhu
  • Xinyi Li
  • Lu Bai
  • Lixin Cui
  • Feilong Cao
  • Ke Lv

Hyperedge prediction plays a critical role in high-order relational modeling with hypergraphs, yet most existing methods primarily focus on sampling strategies or local aggregation within candidate hyperedges. These approaches often overlook global structural dependencies that are essential for learning expressive node and hyperedge representations. In this paper, we propose HyperNoRA, a novel self-supervised hypergraph learning framework that integrates global node-level relation awareness with contrastive learning. Specifically, we construct a global node relation graph that captures both direct and indirect structural correlations, which guides a structure-aware aggregator to enhance node representations with informative global context. To prevent over-smoothing and maintain discriminability, a contrastive learning module is introduced to align representations across graph augmentations while separating semantically dissimilar nodes. Extensive experiments on several benchmark datasets demonstrate that HyperNoRA consistently outperforms state-of-the-art baselines, and ablation studies verify the effectiveness of its key components.

AAAI Conference 2026 Conference Paper

Modeling Item-Level Dynamic Variability with Residual Diffusion for Bundle Recommendation

  • Dong Zhang
  • Lin Li
  • Ming Li
  • Amran Bhuiyan
  • Meng Sun
  • Xiaohui Tao
  • Jimmy Huang

Existing solutions for bundle recommendation (BR) have achieved remarkable effectiveness for predicting the user’s preference for prebuilt bundles. However, bundle-item (B-I) affiliation will vary dynamically in real scenarios. For ex ample, a bundle themed as ‘casual outfit’ may add ‘hat’ or remove ‘watch’ due to factors such as seasonal variations, changes in user preferences or inventory adjustments. Our empirical study demonstrates that the performance of main stream BR models may fluctuate or decline under item-level variability. This paper makes the first attempt to address the above problem and proposes Residual Diffusion for Bundle Recommendation (RDiffBR) as a model-agnostic generative framework which can assist a BR model in adapting this sce nario. During the initial training of the BR model, RDiffBR employs a residual diffusion model to process the item-level bundle embeddings which are generated by the BR model to represent bundle theme via a forward-reverse process. In the inference stage, RDiffBR reverses item-level bundle em beddings obtained by the well-trained bundle model under B-I variability scenarios to generate the effective item-level bundle embeddings. In particular, the residual connection in our residual approximator significantly enhances BR mod els’ ability to generate high-quality item-level bundle embed dings. Experiments on six BRmodelsandfourpublicdatasets from different domains show that RDiffBR improves the per formance of Recall and NDCG of backbone BR models by up to 23%, while only increases training time about 4%.

AAAI Conference 2026 Conference Paper

Multi-Granular Graph Learning with Fine-Grained Behavioral Pattern Awareness for Session-Based Recommendation

  • Ming Li
  • Zihao Yan
  • Yuting Chen
  • Lixin Cui
  • Lu Bai
  • Feilong Cao
  • Ke Lv
  • Zhao Li

Session-based recommendation aims to predict users’ next actions by modeling their ongoing interaction sequences, particularly in scenarios where long-term user profiles are unavailable. While existing methods have achieved promising results by leveraging sequential and graph-based structures, they often rely on global aggregation strategies that emphasize dominant user interests while overlooking the transient and fine-grained behavior patterns embedded in sessions. In practice, user intent evolves across sessions and is reflected through diverse behavioral patterns, ranging from immediate preferences to segmented co-occurrence interests and long-range goals. To address these limitations, we propose GraphFine, a novel multi-granular graph learning framework that achieves fine-grained behavioral pattern awareness for session-based recommendation. Our approach models user behavior at different temporal and semantic granularities through a combination of graph and hypergraph neural networks. Specifically, we employ a position-aware graph to capture short-term item transitions, and construct segmented co-occurrence hypergraphs to uncover high-order semantic relations among co-occurred items. To preserve diverse user intents, we further introduce a multi-view intent readout mechanism that extracts and adaptively integrates intent signals from short-term actions, segmented co-occurrence patterns, and entire sessions. Extensive experiments on benchmark datasets demonstrate that GraphFine consistently outperforms existing state-of-the-art methods, confirming its effectiveness in capturing fine-grained and dynamic user preferences for more accurate recommendation.

AAAI Conference 2026 Conference Paper

Optimization and Robustness-Informed Membership Inference Attacks for LLMs

  • Zichen Song
  • Qixin Zhang
  • Ming Li
  • Yao Shu

The proliferation of Large Language Models (LLMs) has raised concerns over training data privacy. Membership Inference Attacks (MIA), aiming to identify whether specific data was used for training, pose significant privacy risks. However, existing MIA methods struggle to address the scale and complexity of modern LLMs. This paper introduces OR-MIA, a novel MIA framework inspired by model optimization and input robustness. First, training data points are expected to exhibit smaller gradient norms due to optimization dynamics. Second, member samples show greater stability, with gradient norms being less sensitive to controlled input perturbations. OR-MIA leverages these principles by perturbing inputs, computing gradient norms, and using them as features for a robust classifier to distinguish members from non-members. Evaluations on LLMs (70M to 6B parameters) and various datasets demonstrate that OR-MIA outperforms existing methods, achieving over 90% accuracy. Our findings highlight a critical vulnerability in LLMs and underscore the need for improved privacy-preserving training paradigms.

AAAI Conference 2026 Conference Paper

Permutation Equivariant Framelet-based Hypergraph Neural Networks

  • Ming Li
  • Yi Wang
  • Chengling Gao
  • Lu Bai
  • Yujie Fang
  • Xiaosheng Zhuang
  • Pietro Lio

Hypergraphs provide a natural and expressive framework for modeling high-order relationships, enabling the representation of group-wise interactions beyond pairwise connections. While hypergraph neural networks (HNNs) have shown promise for learning on such structures, existing models often rely on shallow message passing and lack the ability to extract multiscale patterns. Framelet-based techniques offer a principled solution by decomposing signals into multiple frequency bands. However, most prior framelet systems, particularly Haar-type ones, are sensitive to node ordering and fail to ensure consistent representations under permutation, leading to instability in hypergraph learning. To address this, we propose Permutation Equivariant Framelet-based Hypergraph Neural Networks (PEF-HNN), a novel framework that integrates multiscale framelet analysis with permutation-consistent learning. We construct a new family of permutation equivariant Haar-type framelets specifically designed for hypergraphs, supported by theoretical analysis of their stability and decomposition properties. Built upon these framelets, PEF-HNN incorporates both low-pass and high-pass components across multiple scales into a unified neural architecture. Extensive experiments on nine benchmark datasets, including three homophilic and four heterophilic hypergraphs, as well as two real-world datasets for visual object classification, demonstrate the effectiveness of our approach, consistently outperforming existing HNN baselines and highlighting the advantages of permutation equivariant framelet design in hypergraph representation learning.

EAAI Journal 2026 Journal Article

Restoring neural radiance fields performance under adverse weather conditions

  • Ying He
  • Gan Chen
  • F. Richard Yu
  • Ming Li
  • Fei Ma
  • Guang Zhou

Neural Radiance Fields (NeRFs) have emerged as a powerful paradigm for modeling complex, photorealistic three-dimensional environments, garnering significant attention in the field of scene-based robotic localization. Regrettably, environments encountered in robotic applications are frequently susceptible to adverse weather conditions (e. g. , rain, snow, fog). Under such conditions, the inherent quality degradation introduced by existing image restoration algorithms severely disrupts the spatial consistency reconstruction of NeRFs. To address this challenge, this paper proposes a novel methodology for three-dimensional Scene Reconstruction under adverse weather, termed WeatherNeRF, which seamlessly integrates an image restoration algorithm with the neural radiance field framework. The proposed method effectively leverages the image restoration algorithm to process input image sequences affected by inclement weather. To mitigate the introduction of invalid artifacts by these processed sequences during scene reconstruction, we employ two regularization functions specifically designed to enhance scene compactness. Furthermore, to bolster scene consistency and facilitate effective scene restoration, we incorporate two-dimensional prior knowledge extracted from an image restoration model—WeatherDiffusion—during the reconstruction process, utilizing Score Distillation Sampling (SDS). Comprehensive experimental evaluations demonstrate that the proposed WeatherNeRF framework effectively restores neural radiance fields in everyday scenes degraded by adverse weather conditions and is capable of synthesizing high-fidelity novel view images. The code and data are publicly available at https: //github. com/C2022G/WeatherNeRF.

AAAI Conference 2026 Conference Paper

Self-Supervised Hypergraph Learning with Substructure Awareness for Hyperedge Prediction

  • Ming Li
  • Huiting Wang
  • Yuting Chen
  • Lu Bai
  • Lixin Cui
  • Feilong Cao
  • Ke Lv

Hyperedge prediction plays a central role in hypergraph learning, enabling the inference of high-order relations among multiple entities. However, existing methods often rely on a simplistic flat set assumption, treating candidate hyperedges as unstructured collections of nodes and neglecting their potential internal compositionality. Furthermore, the severe scarcity of observed hyperedges poses a challenge for effective supervision. In this work, we propose S3Hyper, a Substructure-contextualized Self-Supervised framework for Hyperedge prediction, which jointly addresses these two challenges. Specifically, we design a substructure-contextualized hyperedge aggregator that models the internal hierarchy of candidate hyperedges by leveraging sub-hyperedge information. In parallel, we introduce an adaptive tri-directional contrastive learning module that incorporates node-level, hyperedge-level, and cross-level alignment objectives, supported by temperature-adaptive mechanisms. Experimental results on four public datasets demonstrate that S3Hyper consistently outperforms strong baselines, with ablation studies verifying the effectiveness of each component.

AAAI Conference 2026 Conference Paper

SSHPool: The Separated Subgraph-based Hierarchical Pooling

  • Zhuo Xu
  • Lu Bai
  • Lixin Cui
  • Ming Li
  • Hangyuan Du
  • Ziyu Lyu
  • Yue Wang
  • Edwin R. Hancock

In this paper, we develop a novel local graph pooling method, namely the Separated Subgraph-based Hierarchical Pooling (SSHPool), for graph classification. We commence by assigning the nodes of a sample graph into different clusters, resulting in a family of separated subgraphs. We individually employ the local graph convolution units as the local structure to further compress each subgraph into a coarsened node, transforming the original graph into a coarsened graph. Since these subgraphs are separated by different clusters and the structural information cannot be propagated between them, the local convolution operation can significantly avoid the over-smoothing problem caused by message passing through edges in most existing Graph Neural Networks (GNNs). By hierarchically performing the proposed procedures on the resulting coarsened graph, the proposed SSHPool can effectively extract the hierarchical global features of the original graph structure, encapsulating rich intrinsic structural characteristics. Furthermore, we develop an end-to-end GNN framework associated with the SSHPool module for graph classification. Experimental results demonstrate the superior performance of the proposed model on real-world datasets.

AAAI Conference 2026 Conference Paper

StyleTailor: Towards Personalized Fashion Styling via Hierarchical Negative Feedback

  • Hongbo Ma
  • Fei Shen
  • Hongbin Xu
  • Xiaoce Wang
  • Gang Xu
  • Jinkai Zheng
  • Liangqiong Qu
  • Ming Li

The advancement of intelligent agents has revolutionized problem-solving across diverse domains, yet solutions for personalized fashion styling remain underexplored, which holds immense promise for promoting shopping experiences. In this work, we present StyleTailor, the first collaborative agent framework that seamlessly unifies personalized apparel design, shopping recommendation, virtual try-on, and systematic evaluation into a cohesive workflow. To this end, StyleTailor pioneers an iterative visual refinement paradigm driven by multi-level negative feedback, enabling adaptive and precise user alignment. Specifically, our framework features two core agents, i.e., Designer for personalized garment selection and Consultant for virtual try-on, whose outputs are progressively refined via hierarchical vision-language model feedback spanning individual items, complete outfits, and try-on efficacy. Counterexamples are aggregated into negative prompts, forming a closed-loop mechanism that enhances recommendation quality. To assess the performance, we introduce a comprehensive evaluation suite encompassing style consistency, visual quality, face similarity, and artistic appraisal. Extensive experiments demonstrate StyleTailor's superior performance in delivering personalized designs and recommendations, outperforming strong baselines without negative feedback and establishing a new benchmark for intelligent fashion systems.

JBHI Journal 2026 Journal Article

Zero-Shot Capillary Segmentation in Dermoscopy Images via SAM2: A Case Study on Oral Mucosa

  • Weilan Su
  • Youcheng Zong
  • Runda Jia
  • Jian Qin
  • Ming Li

Morphological changes in oral mucosal microvasculature serve as early diagnostic markers for various diseases. However, existing dermoscopy image analysis relies heavily on physician expertise, leading to high subjectivity and low efficiency. This paper proposes a zero-shot capillary segmentation method for oral mucosa based on Segment Anything Model 2 (SAM2), which effectively handles reflection artifacts and highlights minute vascular structures through a multi-scale adaptive enhancement algorithm. The method employs a morphology-aware automatic prompt annotation strategy to generate composite guidance containing bounding boxes, foreground points, and background points for SAM2. Without requiring annotated data or model training, this approach achieves precise instance segmentation of capillaries through an “enhancement-annotation-segmentation” collaborative paradigm. On a clinical dataset comprising 212 dermoscopy images from 106 subjects, our method achieved a Dice coefficient of 0. 7278 and an IoU of 0. 5721, representing improvements of 17. 12% and 26. 9% respectively compared to the medical-specific baseline MedSAM. This provides an objective auxiliary diagnostic method for oral mucosal diseases that depend on capillary morphology analysis.

NeurIPS Conference 2025 Conference Paper

A learnability analysis on neuro-symbolic learning

  • Hao-Yuan He
  • Ming Li

This paper presents a comprehensive theoretical analysis of the learnability of neuro-symbolic (NeSy) tasks within hybrid systems. We characterize the learnability of NeSy tasks by their derived constraint satisfaction problems (DCSPs), demonstrating that a task is learnable if and only if its corresponding DCSP admits a unique solution. Under mild assumptions, we establish the sample complexity for learnable tasks and show that, for general tasks, the asymptotic expected concept error is controlled by the degree of disagreement among DCSP solutions. Our findings unify the characterization of learnability and the phenomenon of reasoning shortcuts, providing theoretical guarantees and actionable guidance for the principled design of NeSy systems.

JBHI Journal 2025 Journal Article

A Novel Approach to Explore Internal Cardiac Electrophysiological Pattern under Emotional Stress

  • Hanrui Dong
  • Shijie He
  • Wei Wu
  • Xianbin Zhang
  • Ming Li
  • Richard Millham
  • Guibin Bian
  • Wanqing Wu

Numerous psychological and clinical studies have confirmed a correlation between mental and cardiac health. We aim to explore this relationship further by examining how emotions influence cardiac health. By collecting body surface potential and utilizing the electrocardiographic imaging (ECGI) model, we can noninvasively and continuously reconstruct internal cardiac electrical activity. To enhance the existing ECGI model on various datasets, we propose an information fusion strategy called Emotional Potential Conversion CycleGAN. It enables data alignment across diverse datasets while preserving emotional information, allowing us to reconstruct cardiac electrical activity in various emotional states. Our results demonstrate successful data conversion while maintaining emotional integrity, achieving an impressive 91. 92% accuracy in emotion recognition. We further validated this approach using publicly available datasets, WESAD and SWELL, which yielded consistent results. Additionally, we conducted preliminary investigations into the correlation and variability of cardiac activity across different sites under stress. The correlation study indicates a generalized association among various regions of the heart, while variability studies reveal that fluctuations in cardiac electrical activity during stress are primarily concentrated around the atrioventricular node and Purkinje fibers. This suggests a potential risk for pre-excitation syndrome, possibly due to the possible presence of a Kent bundle. Overall, we present a practical approach for studying the interplay between emotional states and cardiac health. Our findings indicate a potential relationship under stress that may provide valuable insights for future research.

IJCAI Conference 2025 Conference Paper

ADFormer: Aggregation Differential Transformer for Passenger Demand Forecasting

  • Haichen Wang
  • Liu Yang
  • Xinyuan Zhang
  • Haomin Yu
  • Ming Li
  • Jilin Hu

Passenger demand forecasting helps optimize vehicle scheduling, thereby improving urban efficiency. Recently, attention-based methods have been used to adequately capture the dynamic nature of spatio-temporal data. However, existing methods that rely on heuristic masking strategies cannot fully adapt to the complex spatio-temporal correlations, hindering the model from focusing on the right context. These works also overlook the high-level correlations that exist in the real world. Effectively integrating these high-level correlations with the original correlations is crucial. To fill this gap, we propose the Aggregation Differential Transformer (ADFormer), which offers new insights to demand forecasting promotion. Specifically, we utilize Differential Attention to capture the original spatial correlations and achieve attention denoising. Meanwhile, we design distinct aggregation strategies based on the nature of space and time. Then, the original correlations are unified with the high-level correlations, enabling the model to capture holistic spatio-temporal relations. Experiments conducted on taxi and bike datasets confirm the effectiveness and efficiency of our model, demonstrating its practical value. The code is available at https: //github. com/decisionintelligence/ADFormer.

IJCAI Conference 2025 Conference Paper

AKBR: Learning Adaptive Kernel-based Representations for Graph Classification

  • Lu Bai
  • Feifei Qian
  • Lixin Cui
  • Ming Li
  • Hangyuan Du
  • Yue Wang
  • Edwin Hancock

In this paper, we propose a new model to learn Adaptive Kernel-based Representations (AKBR) for graph classification. Unlike state-of-the-art R-convolution graph kernels that are defined by merely counting any pair of isomorphic substructures between graphs and cannot provide an end-to-end learning mechanism for the classifier, the proposed AKBR approach aims to define an end-to-end representation learning model to construct an adaptive kernel matrix for graphs. To this end, we commence by leveraging a novel feature-channel attention mechanism to capture the interdependencies between different substructure invariants of original graphs. The proposed AKBR model can thus effectively identify the structural importance of different substructures, and compute the R-convolution kernel between pairwise graphs associated with the more significant substructures specified by their structural attentions. Furthermore, the proposed AKBR model employs all sample graphs as the prototype graphs, naturally providing an end-to-end learning architecture between the kernel computation as well as the classifier. Experimental results show that the proposed AKBR model outperforms existing state-of-the-art graph kernels and deep learning methods on standard graph benchmarks.

IJCAI Conference 2025 Conference Paper

All Roads Lead to Rome: Exploring Edge Distribution Shifts for Heterophilic Graph Learning

  • Yi Wang
  • Changqin Huang
  • Ming Li
  • Tingyi Cai
  • Zhonglong Zheng
  • Xiaodi Huang

Heterophilic graph neural networks (GNNs) have gained prominence for their ability to learn effective representations in graphs with diverse, attribute-aware relationships. While existing methods leverage attribute inference during message passing to improve performance, they often struggle with challenging heterophilic graphs. This is due to edge distribution shifts introduced by diverse connection patterns, which blur attribute distinctions and undermine message-passing stability. This paper introduces H₂OGNN, a novel framework that reframes edge attribute inference as an out-of-distribution (OOD) detection problem. H₂OGNN introduces a simple yet effective symbolic energy regularization approach for OOD learning, ensuring robust classification boundaries between homophilic and heterophilic edge attributes. This design significantly improves the stability and reliability of GNNs across diverse connectivity patterns. Through theoretical analysis, we show that H₂OGNN addresses the graph denoising problem by going beyond feature smoothing, offering deeper insights into how precise edge attribute identification boosts model performance. Extensive experiments on nine benchmark datasets demonstrate that H₂OGNN not only achieves state-of-the-art performance but also consistently outperforms other heterophilic GNN frameworks, particularly on datasets with high heterophily.

IJCAI Conference 2025 Conference Paper

An End-to-End Simple Clustering Hierarchical Pooling Operation for Graph Learning Based on Top-K Node Selection

  • Zhehan Zhao
  • Lu Bai
  • Ming Li
  • Lixin Cui
  • Hangyuan Du
  • Yue Wang
  • Edwin Hancock

Graph Neural Networks (GNNs) are powerful tools for graph learning, but one of the important challenges is how to effectively extract representations for graph-level tasks. In this paper, we propose an end-to-end Simple Clustering Hierarchical Pooling (SCHPool) operation, which is based on Top-K node selection for learning expressive graph representations. Specifically, SCHPool considers each node and its local neighborhood as a cluster, and introduces a novel multi-view scoring function to evaluate node importance. Based on these scores, clusters centered around the Top-K nodes are retained. This design eliminates the need for complex clustering operations, significantly reducing computational overhead. Furthermore, during the coarsening process, SCHPool employs a lightweight yet comprehensive attention mechanism to adaptively aggregate both the node features within clusters and the edge connectivity strengths between clusters. This facilitates the construction of more informative coarsened graphs, enhancing model performance. Experimental results demonstrate the effectiveness of the proposed model.

ICLR Conference 2025 Conference Paper

BenTo: Benchmark Reduction with In-Context Transferability

  • Hongyu Zhao
  • Ming Li
  • Lichao Sun 0001
  • Tianyi Zhou 0001

Evaluating large language models (LLMs) is costly: it requires the generation and examination of LLM outputs on a large-scale benchmark of various tasks. This paper investigates how to efficiently reduce the tasks used to benchmark LLMs without affecting the evaluation quality. Our study reveals that task transferability and relevance provide critical information to identify the most representative subset of tasks via optimizing a facility location function. We propose a practically efficient metric for estimating the transferability between two tasks via in-context learning (ICL). By analyzing the pairwise transferability, we can reduce tasks in a modern LLM benchmark (e.g., MMLU or FLAN) to 5\% while inducing only a $<4$\% difference to the evaluation on the original benchmark. Compared to prior works, our method is training-free, gradient-free, and highly efficient requiring ICL only.

NeurIPS Conference 2025 Conference Paper

ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness

  • Yijun Liang
  • Ming Li
  • Chenrui Fan
  • Ziyue Li
  • Dang Nguyen
  • Kwesi Cobbina
  • Shweta Bhardwaj
  • Jiuhai Chen

Color plays an important role in human perception and usually provides critical clues in visual reasoning. However, it is unclear whether and how vision-language models (VLMs) can perceive, understand, and leverage color as humans. This paper introduces ColorBench, an innovative benchmark meticulously crafted to assess the capabilities of VLMs in color understanding, including color perception, reasoning, and robustness. By curating a suite of diverse test scenarios, with grounding in real applications, ColorBench evaluates how these models perceive colors, infer meanings from color-based cues, and maintain consistent performance under varying color transformations. Through an extensive evaluation of 32 VLMs with varying language models and vision encoders, our paper reveals some undiscovered findings: (i) The scaling law (larger models are better) still holds on ColorBench, while the language model plays a more important role than the vision encoder. (ii) However, the performance gaps across models are relatively small, indicating that color understanding has been largely neglected by existing VLMs. (iii) CoT reasoning improves color understanding accuracies and robustness, though they are vision-centric tasks. (iv) Color clues are indeed leveraged by VLMs on ColorBench but they can also mislead models in some tasks. These findings highlight the critical limitations of current VLMs and underscore the need to enhance color comprehension. Our ColorBench can serve as a foundational tool for advancing the study of human-level color understanding of multimodal AI.

NeurIPS Conference 2025 Conference Paper

CPO: Condition Preference Optimization for Controllable Image Generation

  • Zonglin Lyu
  • Ming Li
  • Xinxin Liu
  • Chen Chen

To enhance controllability in text-to-image generation, ControlNet introduces image-based control signals, while ControlNet++ improves pixel-level cycle consistency between generated images and the input control signal. To avoid the prohibitive cost of back-propagating through the sampling process, ControlNet++ optimizes only low-noise timesteps (e. g. , $t < 200$) using a single-step approximation, which not only ignores the contribution of high-noise timesteps but also introduces additional approximation errors. A straightforward alternative for optimizing controllability across all timesteps is Direct Preference Optimization (DPO), a fine-tuning method that increases model preference for more controllable images ($I^{w}$) over less controllable ones ($I^{l}$). However, due to uncertainty in generative models, it is difficult to ensure that win--lose image pairs differ only in controllability while keeping other factors, such as image quality, fixed. To address this, we propose performing preference learning over control conditions rather than generated images. Specifically, we construct winning and losing control signals, $\mathbf{c}^{w}$ and $\mathbf{c}^{l}$, and train the model to prefer $\mathbf{c}^{w}$. This method, which we term \textit{Condition Preference Optimization} (CPO), eliminates confounding factors and yields a low-variance training objective. Our approach theoretically exhibits lower contrastive loss variance than DPO and empirically achieves superior results. Moreover, CPO requires less computation and storage for dataset curation. Extensive experiments show that CPO significantly improves controllability over the state-of-the-art ControlNet++ across multiple control types: over $10\%$ error rate reduction in segmentation, $70$--$80\%$ in human pose, and consistent $2$--$5\%$ reductions in edge and depth maps. The error rate is defined as the difference between the evaluated controllability and the oracle results. Our project is available \textcolor{blue}{\href{https: //zonglinl. github. io/CPO_page}{here}}.

NeurIPS Conference 2025 Conference Paper

CrossAD: Time Series Anomaly Detection with Cross-scale Associations and Cross-window Modeling

  • Beibu Li
  • Qichao Shentu
  • Yang Shu
  • Hui Zhang
  • Ming Li
  • Ning Jin
  • Bin Yang
  • Chenjuan Guo

Time series anomaly detection plays a crucial role in a wide range of real-world applications. Given that time series data can exhibit different patterns at different sampling granularities, multi-scale modeling has proven beneficial for uncovering latent anomaly patterns that may not be apparent at a single scale. However, existing methods often model multi-scale information independently or rely on simple feature fusion strategies, neglecting the dynamic changes in cross-scale associations that occur during anomalies. Moreover, most approaches perform multi-scale modeling based on fixed sliding windows, which limits their ability to capture comprehensive contextual information. In this work, we propose CrossAD, a novel framework for time series Anomaly Detection that takes Cross-scale associations and Cross-window modeling into account. We propose a cross-scale reconstruction that reconstructs fine-grained series from coarser series, explicitly capturing cross-scale associations. Furthermore, we design a query library and incorporate global multi-scale context to overcome the limitations imposed by fixed window sizes. Extensive experiments conducted on seven real-world datasets using nine evaluation metrics validate the effectiveness of CrossAD, demonstrating state-of-the-art performance in anomaly detection.

AAAI Conference 2025 Conference Paper

Deep Hypergraph Neural Networks with Tight Framelets

  • Ming Li
  • Yujie Fang
  • Yi Wang
  • Han Feng
  • Yongchun Gu
  • Lu Bai
  • Pietro Liò

Hypergraphs provide a flexible framework for modeling high-order (complex) interactions among multiple entities, extending beyond traditional pairwise correlations in graph structures. However, deep hypergraph neural networks (HGNNs) often face the challenge of oversmoothing with increasing depth, similar to issues in graph neural networks (GNNs). While oversmoothing in GNNs has been extensively studied, its implications in relation to hypergraphs are less explored. This paper addresses this gap by first theoretically exploring the reasons behind oversmoothing in deep HGNNs. Our novel insights suggest that a spectral-based hypergraph convolution, equipped with both low-pass and high-pass filters, can potentially mitigate these effects. Motivated by these findings, we introduce FrameHGNN, a framework that utilizes framelet-based hypergraph convolutions integrating tight framelet transforms with both low-pass and high-pass components, as well as the commonly used strategies in designing deep GNN architecture: initial residual and identity mappings. The experiment results on diverse benchmark datasets demonstrate that FrameHGNN outperforms several state-of-the-art models, effectively reducing oversmoothing while improving predictive accuracy. Our contributions not only advance the theoretical understanding of deep hypergraph learning but also provide a practical spectral-based approach for HGNNs, emphasizing the design of multifrequency channels.

AAAI Conference 2025 Conference Paper

DHAKR: Learning Deep Hierarchical Attention-Based Kernelized Representations for Graph Classification

  • Feifei Qian
  • Lu Bai
  • Lixin Cui
  • Ming Li
  • Ziyu Lyu
  • Hangyuan Du
  • Edwin Hancock

Graph-based representations are powerful tools for analyzing structured data. In this paper, we propose a novel model to learn Deep Hierarchical Attention-based Kernelized Representations (DHAKR) for graph classification. To this end, we commence by learning an assignment matrix to hierarchically map the substructure invariants into a set of composite invariants, resulting in hierarchical kernelized representations for graphs. Moreover, we introduce the feature-channel attention mechanism to capture the interdependencies between different substructure invariants that will be converged into the composite invariants, addressing the shortcoming of discarding the importance of different substructures arising in most existing R-convolution graph kernels. We show that the proposed DHAKR model can adaptively compute the kernel-based similarity between graphs, identifying the common structural patterns over all graphs. Experiments demonstrate the effectiveness of the proposed DHAKR model.

IJCAI Conference 2025 Conference Paper

DHTAGK: Deep Hierarchical Transitive-Aligned Graph Kernels for Graph Classification

  • Xinya Qin
  • Lu Bai
  • Lixin Cui
  • Ming Li
  • Ziyu Lyu
  • Hangyuan Du
  • Edwin Hancock

In this paper, we propose a family of novel Deep Hierarchical Transitive-Aligned Graph Kernels (DHTAGK) for graph classification. To this end, we commence by developing a new Hierarchical Aligned Graph Auto-Encoder (HA-GAE) to construct transitive-aligned embedding graphs that encapsulate the structural correspondence information between graphs. The DHTAGK kernels then measure either the Jensen-Shannon Divergence between the adjacency matrices or the Gaussian kernel between the node feature matrices of the embedding graphs. Unlike the classical R-convolution kernels and node-based alignment kernels, the DHTAGK kernels can capture the transitive structural correspondence information and thus ensure the positive definiteness. Furthermore, the HA-GAE enables the DHTAGK kernels to simultaneously reflect both local and global graph structures and identify common structural patterns. Experimental results show that the DHTAGK kernels outperform state-of-the-art graph kernels and deep learning methods on benchmark datasets.

TMLR Journal 2025 Journal Article

Evolution of Discriminator and Generator Gradients in GAN Training: From Fitting to Collapse

  • Weiguo Gao
  • Ming Li

Generative Adversarial Networks (GANs) are powerful generative models but often suffer from mode mixture and mode collapse. We propose a perspective that views GAN training as a two-phase progression from fitting to collapse, where mode mixture and mode collapse are treated as inter-connected. Inspired by the particle model interpretation of GANs, we leverage the discriminator gradient to analyze particle movement and the generator gradient, specifically "steepness," to quantify the severity of mode mixture by measuring the generator's sensitivity to changes in the latent space. Using these theoretical insights into evolution of gradients, we design a specialized metric that integrates both gradients to detect the transition from fitting to collapse. This metric forms the basis of an early stopping algorithm, which stops training at a point that retains sample quality and diversity. Experiments on synthetic and real-world datasets, including MNIST, Fashion MNIST, and CIFAR-10, validate our theoretical findings and demonstrate the effectiveness of the proposed algorithm.

IJCAI Conference 2025 Conference Paper

Exploring the Over-smoothing Problem of Graph Neural Networks for Graph Classification: An Entropy-based Viewpoint

  • Feifei Qian
  • Lu Bai
  • Lixin Cui
  • Ming Li
  • Hangyuan Du
  • Yue Wang
  • Edwin Hancock

The over-smoothing has emerged as a major challenge in the development of Graph Neural Networks (GNNs). While existing state-of-the-art methods effectively mitigate the diminishing distance between nodes and improve the performance of node classification, they tend to be elusive for graph-level tasks. This paper introduces a novel entropy-based perspective to explore the over-smoothing problem, simultaneously enhancing the distinguishability of non-isomorphic graphs. We provide a theoretical analysis of the relationship between the smoothness and the entropy for graphs, highlighting how the over-smoothing in high-entropic regions negatively impact the graph classification performance. To tackle this issue, we propose a simple yet effective method to Sample and Discretize node features in high-Entropic regions (SDE), aiming to preserve the critical and complicated structural information. Moreover, we introduce a new evaluation metric to assess the over-smoothing for graph-level tasks, focusing on node distributions. Experimental results demonstrate that the proposed SDE method significantly outperforms existing state-of-the-art methods, establishing a new benchmark in the field of GNNs.

IJCAI Conference 2025 Conference Paper

HA-SCN: Learning Hierarchical Aligned Subtree Convolutional Networks for Graph Classification

  • Xinya Qin
  • Lu Bai
  • Lixin Cui
  • Ming Li
  • Hangyuan Du
  • Yue Wang
  • Edwin Hancock

In this paper, we propose a Hierarchical Aligned Subtree Convolutional Network (HA-SCN) for graph classification. Our idea is to transform graphs of arbitrary sizes into fixed-sized aligned graphs and construct a normalized K-layer m-ary subtree for each node in the aligned graphs. By sliding convolutional filters over the entire subtree at each node, we define a novel subtree convolution and pooling operation that hierarchically abstracts node-level information. We demonstrate that the proposed HA-SCN model not only realizes the convolution mechanism similar to the Convolutional Neural Networks (CNNs), which have the characteristics of weight sharing and fixed-sized receptive fields, but also effectively mitigates the over-squashing problem. Meanwhile, it establishes the correspondence information between nodes, alleviating the information loss issue. Experimental results on various benchmark graph datasets show that our approach achieves state-of-the-art performance in graph classification tasks.

NeurIPS Conference 2025 Conference Paper

HyperMixup: Hypergraph-Augmented with Higher-order Information Mixup

  • Kaixuan Yao
  • Zhuo Li
  • Jianqing Liang
  • Jiye Liang
  • Ming Li
  • Feilong Cao

Hypergraphs offer a natural paradigm for modeling complex systems with multi-way interactions. Hypergraph neural networks (HGNNs) have demonstrated remarkable success in learning from such higher-order relational data. While such higher-order modeling enhances relational reasoning, the effectiveness of hypergraph learning remains bottlenecked by two persistent challenges: the scarcity of labeled data inherent to complex systems, and the vulnerability to structural noise in real-world interaction patterns. Traditional data augmentation methods, though successful in Euclidean and graph-structured domains, struggle to preserve the intricate balance between node features and hyperedge semantics, often disrupting the very group-wise interactions that define hypergraph value. To bridge this gap, we present HyperMixup, a hypergraph-aware augmentation framework that preserves higher-order interaction patterns through structure-guided feature mixing. Specifically, HyperMixup contains three critical components: 1) Structure-aware node pairing guided by joint feature-hyperedge similarity metrics, 2) Context-enhanced hierarchical mixing that preserves hyperedge semantics through dual-level feature fusion, and 3) Adaptive topology reconstruction mechanisms that maintain hypergraph consistency while enabling controlled diversity expansion. Theoretically, we establish that our method induces hypergraph-specific regularization effects through gradient alignment with hyperedge covariance structures, while providing robustness guarantees against combined node-hyperedge perturbations. Comprehensive experiments across diverse hypergraph learning tasks demonstrate consistent performance improvements over state-of-the-art baselines, with particular effectiveness in low-label regimes. The proposed framework advances hypergraph representation learning by unifying data augmentation with higher-order topological constraints, offering both practical utility and theoretical insights for relational machine learning.

EAAI Journal 2025 Journal Article

Improving blind face restoration by utilizing edge semantic enhancement

  • Xiaodong Qian
  • Jianglin Wang
  • Deyi Xiong
  • Ming Li
  • Rong He
  • Xianlun Tang

As the degree of degradation intensifies, prior-based blind face restoration networks encounter the problem of prior inaccuracy originating from low-quality images. At the same time, most current restoration methods directly use low-quality images to participate in the restoration process, which introduces a large amount of degradation information and makes it challenging to ensure the reliability and edge details of the restoration results. We propose an edge semantic enhanced facial restoration framework to alleviate the issues above. Our framework mainly consists of an Edge Semantic Supplement Module (ESSM) that utilizes Edge Semantic Enhancement (ESE) and a Precursor Feature Fusion Module (PFFM). ESSM and PFFM are respectively designed to increase the edge information of decoding features at different depths and balance the introduction of degraded information from the encoding process. In addition, we propose Range-interpolation Padding (RIP) to enhance our ESE and reduce the low-frequency information introduced by padding. The method proposed performs well on both synthetic and real-world datasets, effectively improving the edge semantics of the restoration results in both super-resolution and blind face restoration tasks. Moreover, it enables fast inference while better preserving the identity information of the face image restoration.

IJCAI Conference 2025 Conference Paper

Inter3D: A Benchmark and Strong Baseline for Human-Interactive 3D Object Reconstruction

  • Gan Chen
  • Ying He
  • Mulin Yu
  • F. Richard Yu
  • Gang Xu
  • Fei Ma
  • Ming Li
  • Guang Zhou

Recent advancements in implicit 3D reconstruction methods, e. g. , neural rendering fields and Gaussian splatting, have primarily focused on novel view synthesis of static or dynamic objects with continuous motion states. However, these approaches struggle to efficiently model a human-interactive object with n movable parts, requiring 2^n separate models to represent all discrete states. To overcome this limitation, we propose Inter3D, a new benchmark and approach for novel state synthesis of human-interactive objects. We introduce a self-collected dataset featuring commonly encountered interactive objects and a new evaluation pipeline, where only individual part states are observed during training, while part combination states remain unseen. We also propose a strong baseline approach that leverages Space Discrepancy Tensors to efficiently modelling all states of an object. To alleviate the impractical constraints on camera trajectories across training states, we propose a Mutual State Regularization mechanism to enhance the spatial density consistency of movable parts. In addition, we explore two occupancy grid sampling strategies to facilitate training efficiency. We conduct extensive experiments on the proposed benchmark, showcasing the challenges of the task and the superiority of our approach. The code and data are publicly available at https: //github. com/Inter3D-ui/Inter3D.

EAAI Journal 2025 Journal Article

Mastering autonomous assembly in fusion application with learning-by-doing: A peg-in-hole study

  • Ruochen Yin
  • Huapeng Wu
  • Ming Li
  • Yong Cheng
  • Yuntao Song
  • Hongtao Pan
  • Heikki Handroos

Robotic peg-in-hole assembly represents a critical area of investigation in robotic automation. Traditional approaches primarily rely on optical or force/torque (F/T) sensors, each with inherent limitations: restricted assembly accuracy or inefficient peg-hole alignment. Deep Reinforcement Learning (DRL) based methods have the potential to combine data from both types of sensors to achieve improved results. However, our application scenario is situated inside a fusion reactor, where the radiation environment and the abundance of smooth metal surfaces make commonly used three-dimensional (3D) sensors face operational challenges. To address this, we propose a novel DRL-based approach that, unlike conventional methods, integrates data from a two-dimensional (2D) camera and an F/T sensor. This approach trains the agent to perform peg-in-hole assembly tasks by mimicking human hand-eye coordination. It utilizes multi-input branch neural network to fuse multi-sensor data with significant differences and automatically adjusts the weights of multi-source data at different stages of assembly, thereby simultaneously meeting the requirements of fast alignment and high-precision assembly. Real-world experimental results demonstrate the effectiveness of this multi-sensor fusion approach, particularly in rigid peg-in-hole assembly tasks. It achieves rapid assembly with a peg-hole clearance of less than 0. 1 mm (smaller than the repeatable accuracy of the robotic arm we used) while maintaining a high success rate and avoiding risky behaviors. This study shows that high-precision peg-in-hole assembly operations can be successfully accomplished using only a 2D camera as the optical sensor. This advancement significantly contributes to the development of high-precision automated operations in challenging environments, such as radiation-exposed settings.

IJCAI Conference 2025 Conference Paper

MATCH: Modality-Calibrated Hypergraph Fusion Network for Conversational Emotion Recognition

  • Jiandong Shi
  • Ming Li
  • Lu Bai
  • Feilong Cao
  • Ke Lu
  • Jiye Liang

Multimodal emotion recognition aims to identify emotions by integrating multimodal features derived from spoken utterances. However, existing work often neglects the calibration of conversational entities, focusing mainly on extracting potential intra- or cross-modal information. This leads to the underutilization of utterance information that is essential for accurately characterizing emotion. Additionally, the lack of effective modeling of conversational patterns limits the ability to capture emotional pathways across contexts, modalities and speakers, impacting the overall emotional understanding. In this study, we propose the modality-calibrated hypergraph fusion network (MATCH), which leverages multimodal fusion and hypergraph learning techniques to address these challenges. In particular, we introduce an entity calibration strategy that refines the representations of conversational entities both at the modality and context levels, allowing for deeper insights into emotion-related cues. Furthermore, we present an emotion-aligned hypergraph fusion method that incorporates a line graph to explore conversational patterns, facilitating flexible knowledge transfer across modalities through hyperedge-level and graph-level alignments. Experiments demonstrate that MATCH outperforms state-of-the-art approaches on two benchmark datasets.

AAAI Conference 2025 Conference Paper

ML-GOOD: Towards Multi-Label Graph Out-Of-Distribution Detection

  • Tingyi Cai
  • Yunliang Jiang
  • Ming Li
  • Changqin Huang
  • Yi Wang
  • Qionghao Huang

The out-of-distribution (OOD) detection on graph-structured data is crucial for deploying graph neural networks securely in open-world scenarios. However, existing methods have overlooked the prevalent scenario of multi-label classification in real-world applications. In this work, we investigate the unexplored issue of OOD detection within multi-label node classification tasks. We propose ML-GOOD, a simple yet sufficient approach that utilizes an energy function to gauge the OOD score for each label. We further develop a strategy for amalgamating multiple label energies, allowing for the comprehensive utilization of label information to tackle the primary challenges encountered in multi-label scenarios. Extensive experimentation conducted on seven diverse sets of real-world multi-label graph datasets, encompassing cross-domain scenarios. The results show that the AUROC of ML-GOOD is improved by 5.26% in intra-domain and 6.54% in cross-domain compared to the previous methods. These empirical validations not only affirm the robustness of our methodology but also illuminate new avenues for further exploration within this burgeoning field of research.

JBHI Journal 2025 Journal Article

MSMTSeg: Multi-Stained Multi-Tissue Segmentation of Kidney Histology Images via Generative Self-Supervised Meta-Learning Framework

  • Xueyu Liu
  • Rui Wang
  • Yexin Lai
  • Yongfei Wu
  • Hangbei Cheng
  • Yuanyue Lu
  • Jianan Zhang
  • Ning Hao

Accurately diagnosing chronic kidney disease requires pathologists to assess the structure of multiple tissues under different stains, a process that is time-consuming and labor-intensive. Current AI-based methods for automatic structure assessment, like segmentation, often demand extensive manual annotation and focus on single stain domain. To address these challenges, we introduce MSMTSeg, a generative self-supervised meta-learning framework for multi-stained multi-tissue segmentation in renal biopsy whole slide images (WSIs). MSMTSeg incorporates multiple stain transform models for style translation of inter-stain domains, a self-supervision module for obtaining pre-trained models with the domain-specific feature representation, and a meta-learning strategy that leverages generated virtual data and pre-trained models to learn the domain-invariant feature representation across multiple stains, thereby enhancing segmentation performance. Experimental results demonstrate that MSMTSeg achieves superior and robust performance, with mDSC of 0. 836 and mIoU of 0. 718 for multiple tissues under different stains, using only one annotated training sample for each stain. Our ablation study confirms the effectiveness of each component, positioning MSMTSeg ahead of classic advanced segmentation networks, recent few-shot segmentation methods, and unsupervised domain adaptation methods. In conclusion, our proposed few-shot cross-domain technology offers a feasible and cost-effective solution for multi-stained renal histology segmentation, providing convenient assistance to pathologists in clinical practice.

ICLR Conference 2025 Conference Paper

Multi-Reward as Condition for Instruction-based Image Editing

  • Xin Gu 0003
  • Ming Li
  • Libo Zhang 0001
  • Fan Chen
  • Longyin Wen
  • Tiejian Luo
  • Sijie Zhu

High-quality training triplets (instruction, original image, edited image) are essential for instruction-based image editing. Predominant training datasets (e.g., InsPix2Pix) are created using text-to-image generative models (e.g., Stable Diffusion, DALL-E) which are not trained for image editing. Accordingly, these datasets suffer from inaccurate instruction following, poor detail preserving, and generation artifacts. In this paper, we propose to address the training data quality issue with multi-perspective reward data instead of refining the ground-truth image quality. 1) we first design a quantitative metric system based on best-in-class LVLM (Large Vision Language Model), i.e., GPT-4o in our case, to evaluate the generation quality from 3 perspectives, namely, instruction following, detail preserving, and generation quality. For each perspective, we collected quantitative score in $0\sim 5$ and text descriptive feedback on the specific failure points in ground-truth edited images, resulting in a high-quality editing reward dataset, i.e., RewardEdit20K. 2) We further proposed a novel training framework to seamlessly integrate the metric output, regarded as multi-reward, into editing models to learn from the imperfect training triplets. During training, the reward scores and text descriptions are encoded as embeddings and fed into both the latent space and the U-Net of the editing models as auxiliary conditions. During inference, we set these additional conditions to the highest score with no text description for failure points, to aim at the best generation outcome. 3) We also build a challenging evaluation benchmark with real-world images/photos and diverse editing instructions, named as Real-Edit. Experiments indicate that our multi-reward conditioned model outperforms its no-reward counterpart on two popular editing pipelines, i.e., InsPix2Pix and SmartEdit. Code is released at https://github.com/bytedance/Multi-Reward-Editing.

NeurIPS Conference 2025 Conference Paper

MultiNet: Adaptive Multi-Viewed Subgraph Convolutional Networks for Graph Classification

  • Xinya Qin
  • Lu Bai
  • Lixin Cui
  • Ming Li
  • Hangyuan Du
  • Edwin Hancock

The problem of over-smoothing has emerged as a fundamental issue for Graph Convolutional Networks (GCNs). While existing efforts primarily focus on enhancing the discriminability of node representations for node classification, they tend to overlook the over-smoothing at the graph level, significantly influencing the performance of graph classification. In this paper, we provide an explanation of the graph-level over-smoothing phenomenon and propose a novel Adaptive Multi-Viewed Subgraph Convolutional Network (MultiNet) to address this challenge. Specifically, the MultiNet introduces a local subgraph convolution module that adaptively divides each input graph into multiple subgraph views. Then a number of subgraph-based view-specific convolution operations are applied to constrain the extent of node information propagation over the original global graph structure, not only mitigating the over-smoothing issue but also generating more discriminative local node representations. Moreover, we develop an alignment-based readout that establishes correspondences between nodes over different graphs, thereby effectively preserving the local node-level structure information and improving the discriminative ability of the resulting graph-level representations. Theoretical analysis and empirical studies show that the MultiNet mitigates the graph-level over-smoothing and achieves excellent performance for graph classification.

AAMAS Conference 2025 Conference Paper

Responsible Uplift Modeling

  • Lihi Idan
  • Ming Li

Automated intervention policies have become highly prevalent within firms, with "algorithmic personalization" techniques at their foundation. These methods leverage individual-level data to decide which groups should be targeted by the firm’s policies. While such policies are naturally guided by the multi-dimensional heterogeneity that exists among individuals, relying on some dimensions of such heterogeneity may unintentionally result in biased outcomes for socially-disadvantaged groups. This work focuses on a particular form of personalization: Uplift Modeling. While research on fairness in algorithmic personalization has been growing in recent years, the broader societal impact of Uplift Modeling has largely been overlooked in previous technical work. We introduce the first in-processing, learning-based method for Fair Uplift Modeling, applicable in both static and dynamic environments. Our Uplift Models are evaluated on real-world datasets, demonstrating promising results.

NeurIPS Conference 2025 Conference Paper

Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking

  • Zihan Su
  • Xuerui Qiu
  • Hongbin Xu Xu
  • Tangyu Jiang
  • Jun-hao Zhuang
  • Chun Yuan
  • Ming Li
  • Shengfeng He

The explosive growth of generative video models has amplified the demand for reliable copyright preservation of AI-generated content. Despite its popularity in image synthesis, invisible generative watermarking remains largely underexplored in video generation. To address this gap, we propose Safe-Sora, the first framework to embed graphical watermarks directly into the video generation process. Motivated by the observation that watermarking performance is closely tied to the visual similarity between the watermark and cover content, we introduce a hierarchical coarse-to-fine adaptive matching mechanism. Specifically, the watermark image is divided into patches, each assigned to the most visually similar video frame, and further localized to the optimal spatial region for seamless embedding. To enable spatiotemporal fusion of watermark patches across video frames, we develop a 3D wavelet transform-enhanced Mamba architecture with a novel scanning strategy, effectively modeling long-range dependencies during watermark embedding and retrieval. To the best of our knowledge, this is the first attempt to apply state space models to watermarking, opening new avenues for efficient and robust watermark protection. Extensive experiments demonstrate that Safe-Sora achieves state-of-the- art performance in terms of video quality, watermark fidelity, and robustness, which is largely attributed to our proposals. Code and additional supporting materials are provided in the supplementary.

NeurIPS Conference 2025 Conference Paper

Sekai: A Video Dataset towards World Exploration

  • Zhen Li
  • Chuanhao Li
  • Xiaofeng Mao
  • Shaoheng Lin
  • Ming Li
  • Shitian Zhao
  • Zhaopan Xu
  • Xinyue Li

Video generation techniques have made remarkable progress, promising to be the foundation of interactive world exploration. However, existing video generation datasets are not well-suited for world exploration training as they suffer from some limitations: limited locations, short duration, static scenes, and a lack of annotations about exploration and the world. In this paper, we introduce Sekai (meaning "world" in Japanese), a high-quality first-person view worldwide video dataset with rich annotations for world exploration. It consists of over 5, 000 hours of walking or drone view (FPV and UVA) videos from over 100 countries and regions across 750 cities. We develop an efficient and effective toolbox to collect, pre-process and annotate videos with location, scene, weather, crowd density, captions, and camera trajectories. Comprehensive analyses and experiments demonstrate the dataset’s scale, diversity, annotation quality, and effectiveness for training video generation models. We believe Sekai will benefit the area of video generation and world exploration, and motivate valuable applications.

ICLR Conference 2025 Conference Paper

SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training

  • Nie Lin
  • Takehiko Ohkawa
  • Yifei Huang 0002
  • Mingfang Zhang 0002
  • Minjie Cai
  • Ming Li
  • Ryosuke Furuta
  • Yoichi Sato 0001

We present a framework for pre-training of 3D hand pose estimation from in-the-wild hand images sharing with similar hand characteristics, dubbed SiMHand. Pre-training with large-scale images achieves promising results in various tasks, but prior methods for 3D hand pose pre-training have not fully utilized the potential of diverse hand images accessible from in-the-wild videos. To facilitate scalable pre-training, we first prepare an extensive pool of hand images from in-the-wild videos and design our pre-training method with contrastive learning. Specifically, we collect over 2.0M hand images from recent human-centric videos, such as 100DOH and Ego4D. To extract discriminative information from these images, we focus on the similarity of hands: pairs of non-identical samples with similar hand poses. We then propose a novel contrastive learning method that embeds similar hand pairs closer in the feature space. Our method not only learns from similar samples but also adaptively weights the contrastive learning loss based on inter-sample distance, leading to additional performance gains. Our experiments demonstrate that our method outperforms conventional contrastive learning approaches that produce positive pairs solely from a single image with data augmentation. We achieve significant improvements over the state-of-the-art method (PeCLR) in various datasets, with gains of 15% on FreiHand, 10% on DexYCB, and 4% on AssemblyHands. Our code is available at https://github.com/ut-vision/SiMHand.

AAAI Conference 2025 Conference Paper

Slice-and-Pack: Tailoring Deep Models for Customized Requirements

  • Ruice Rao
  • Dingwei Li
  • Ming Li

The learnware paradigm aims to establish a learnware market such that users can build their own models by reusing appropriate existing models in the market without starting from scratch. It is often the case that a single model is insufficient to fully satisfy the user's requirement. Meanwhile, offering multiple models can lead to higher costs for users alongside an increase in hardware resource demands. To address this challenge, this paper proposes the ''Slice-and-Pack'' (S&P) framework to empower the market to provide users with only the required model fragments without having to offer entire abilities of all involved models. Our framework first slices a set of models into small fragments and subsequently packs selected fragments according to user's specific requirement. In the slicing stage, we extract units layer by layer and connect these units to create numerous fragments. In the packing stage, an encoder-decoder mechanism is employed to assemble these fragments. These processes are conducted within data-limited constraints due to privacy concerns. Extensive experiments validate the effectiveness of our framework.

NeurIPS Conference 2025 Conference Paper

To Think or Not To Think: A Study of Thinking in Rule-Based Visual Reinforcement Fine-Tuning

  • Ming Li
  • Jike Zhong
  • Shitian Zhao
  • Yuxiang Lai
  • Haoquan Zhang
  • Wang Bill Zhu
  • Kaipeng Zhang

This paper investigates the role of explicit thinking process in rule-based reinforcement fine-tuning (RFT) for multi-modal large language models (MLLMs). We first extend \textit{Thinking-RFT} to image classification task, using verifiable rewards for fine-tuning~(FT). Experiments show {Thinking-RFT} significantly outperforms supervised FT and yields a cross-dataset generalization effect. We then rethink and question whether explicit thinking in RFT is always necessary and beneficial. Challenging the convention that explicit thinking is crucial for the success of RFT, we introduce \textit{No-Thinking-RFT}, exploring RFT without thinking by introducing a simple equality accuracy reward. We evaluate No-Thinking-RFT on six diverse tasks across different model sizes and types. Experiment results reveal four key findings: \textbf{(1). } Visual perception tasks do not require thinking during RFT, as No-Thinking-RFT consistently outperforms or matches Thinking-RFT across model sizes and types. \textbf{(2). } Models with limited capabilities struggle to generate high-quality CoT for RFT, making Thinking-RFT less effective than No-Thinking-RFT. \textbf{(3). } There are inconsistencies between the answers in the thinking tags and answer tags for some responses of Thinking-RFT, which show lower average accuracy than the overall accuracy. \textbf{(4). } The performance gain of No-Thinking-RFT mainly stems from improved learning during no thinking FT and the avoidance of inference overthinking, as evidenced by the partial gains from appending empty thinking tags at inference time of Thinking-RFT. We hypothesize that explicit thinking before verifiable answers may hinder reward convergence and reduce performance in certain scenarios. To test this, we propose \textit{Think-After-Answer}, which places thinking after the answer to mitigate this effect for experimental verification. Lastly, we conduct a pilot study to explore whether MLLMs can learn when to think during RFT, introducing an \textit{Adaptive-Thinking} method. Experiments show that model converges to either thinking or not depending on model capability, achieving comparable or better performance than both Thinking and No-Thinking-RFT. Our findings suggest MLLMs can adaptively decide to think or not based on their capabilities and task complexity, offering insights into the thinking process in RFT.

EAAI Journal 2025 Journal Article

Two-layer knowledge graph transformer network-based question and answer explainable recommendation

  • Ying Li
  • Ming Li
  • Jin Ding
  • Yixue Bai

The question and answer (Q&A) recommendation in community question answering (CQA) helps users quickly and accurately find the desired Q&A. However, existing studies face the problems of sparse interaction data, cold starts, and a lack of explanations. This paper proposes a novel Q&A explainable recommendation approach based on a two-layer knowledge graph transformer network. It alleviates the sparse data and cold start problem by the novel two-layer knowledge graph. First, a two-layer knowledge graph in CQA is constructed. The interaction layer helps to enrich the associations between users and questions and answers (Q&As). The semantic layer provides semantic associations and reflects contextual domain knowledge. Second, a critical meta-path recognition module is constructed to learn the critical meta-paths between users and documents from the interaction layer. Then, a user and Q&A embedding method based on a two-layer knowledge graph is proposed to enhance the user and Q&A representations. Finally, a recommendation and explanation layer is established to obtain personalized Q&A recommendation results and corresponding explanations. Compared with the baselines, the proposed method shows superior performance. It achieves average improvements of 21. 28%, 28. 41% and 27. 18% in precision, recall and F1-measure, respectively, in the top- K Q&A recommendation separately. It improves the area under the curve and F1-measure of the click-through rate prediction recommendation by 11. 32% and 23. 06%, respectively.

NeurIPS Conference 2025 Conference Paper

ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs

  • Xiyao Wang
  • Zhengyuan Yang
  • Chao Feng
  • Yuhang Zhou
  • Xiaoyu Liu
  • Yongyuan Liang
  • Ming Li
  • Ziyi Zang

Reinforcement learning (RL) has shown great effectiveness for fine-tuning large language models (LLMs) using tasks that are challenging yet easily verifiable, such as math reasoning or code generation. However, extending this success to visual perception in vision–language models (VLMs) has been impeded by the scarcity of vision-centric tasks that are simultaneously challenging and unambiguously verifiable. To this end, we introduce \textbf{ViCrit} (\textit{Visual Caption Hallucination Critic}), an RL proxy task that trains VLMs to localize a subtle, synthetic visual hallucination injected into paragraphs of human-written image captions. Starting from a 200-word captions, we inject a single, subtle visual description error—altering a few words on objects, attributes, counts, or spatial relations—and task the model to pinpoint the corrupted span given the image and the modified caption. This formulation preserves the full perceptual difficulty while providing a binary, exact-match reward that is easy to compute and unambiguous. Models trained with the \textbf{ViCrit Task} exhibit substantial gains across a variety of VL benchmarks. Crucially, the improvements transfer beyond natural-image training data to abstract image reasoning and visual math, showing promises of learning to perceive rather than barely memorizing seen objects. To facilitate evaluation, we further introduce \textbf{ViCrit-Bench}, a category-balanced diagnostic benchmark that systematically probes perception errors across diverse image domains and error types. Together, our results demonstrate that fine-grained hallucination criticism is an effective and generalizable objective for enhancing visual perception in VLMs.

AAAI Conference 2025 Conference Paper

When Hypergraph Meets Heterophily: New Benchmark Datasets and Baseline

  • Ming Li
  • Yongchun Gu
  • Yi Wang
  • Yujie Fang
  • Lu Bai
  • Xiaosheng Zhuang
  • Pietro Liò

Hypergraph neural networks (HNNs) have shown promise in handling tasks characterized by high-order correlations, achieving notable success across various applications. However, there has been limited focus on heterophilic hypergraph learning (HHL), in contrast to the increasing attention given to graph neural networks designed for graphs exhibiting heterophily. This paper aims to pave the way for HHL by addressing key gaps from multiple perspectives: measurement, dataset diversity, and baseline model development. First, we introduce metrics to quantify heterophily in hypergraphs, providing a numerical basis for assessing the homophily/heterophily ratio. Second, we develop diverse benchmark datasets across various real-world scenarios, facilitating comprehensive evaluations of existing HNNs and advancing research in HHL. Additionally, as a novel baseline model, we propose HyperUFG, a framelet-based HNN integrating both low-pass and high-pass filters. Extensive experiments conducted on synthetic and benchmark datasets highlight the challenges current HNNs face with heterophilic hypergraphs, while showcasing that HyperUFG performs competitively and often outperforms many existing models in such scenarios. Overall, our study underscores the urgent need for further exploration and development in this emerging field, with the potential to inspire and guide future research in HHL.

AAAI Conference 2025 Conference Paper

Zero-shot Video Moment Retrieval via Off-the-shelf Multimodal Large Language Models

  • Yifang Xu
  • Yunzhuo Sun
  • Benxiang Zhai
  • Ming Li
  • Wenxin Liang
  • Yang Li
  • Sidan Du

The target of video moment retrieval (VMR) is predicting temporal spans within a video that semantically match a given linguistic query. Existing VMR methods based on multimodal large language models (MLLMs) overly rely on expensive high-quality datasets and time-consuming fine-tuning. Although some recent studies introduce a zero-shot setting to avoid fine-tuning, they overlook inherent language bias in the query, leading to erroneous localization. To tackle the aforementioned challenges, this paper proposes Moment-GPT, a tuning-free pipeline for zero-shot VMR utilizing frozen MLLMs. Specifically, we first employ LLaMA-3 to correct and rephrase the query to mitigate language bias. Subsequently, we design a span generator combined with MiniGPT-v2 to produce candidate spans adaptively. Finally, to leverage the video comprehension capabilities of MLLMs, we apply Video-ChatGPT and span scorer to select the most appropriate spans. Our proposed method substantially outperforms the state-of-the-art MLLM-based and zero-shot models on several public datasets, including QVHighlights, ActivityNet-Captions, and Charades-STA.

AAAI Conference 2024 Conference Paper

AUC Optimization from Multiple Unlabeled Datasets

  • Zheng Xie
  • Yu Liu
  • Ming Li

Weakly supervised learning aims to make machine learning more powerful when the perfect supervision is unavailable, and has attracted much attention from researchers. Among the various scenarios of weak supervision, one of the most challenging cases is learning from multiple unlabeled (U) datasets with only a little knowledge of the class priors, or U^m learning for short. In this paper, we study the problem of building an AUC (area under ROC curve) optimal model from multiple unlabeled datasets, which maximizes the pairwise ranking ability of the classifier. We propose U^m-AUC, an AUC optimization approach that converts the U^m data into a multi-label AUC optimization problem, and can be trained efficiently. We show that the proposed U^m-AUC is effective theoretically and empirically.

TIST Journal 2024 Journal Article

Boosting Healthiness Exposure in Category-Constrained Meal Recommendation Using Nutritional Standards

  • Ming Li
  • Lin Li
  • Xiaohui Tao
  • Zhongwei Xie
  • Qing Xie
  • Jingling Yuan

Food computing, a newly emerging topic, is closely linked to human life through computational methodologies. Meal recommendation, a food-related study about human health, aims to provide users a meal with courses constrained from specific categories (e.g., appetizers, main dishes) that can be enjoyed as a service. Historical interaction data, important user information, is often used by existing models to learn user preferences. However, if a user’s preferences favor less healthy meals, the model will follow that preference and make similar recommendations, potentially negatively impacting the user’s long-term health. This emphasizes the necessity for health-oriented and responsible meal recommendation systems. In this article, we propose a healthiness-aware and category-wise meal recommendation model called CateRec, which boosts healthiness exposure by using nutritional standards as knowledge to guide the model training. Two fundamental questions are raised and answered: (1) How can the healthiness of meals be evaluated? Two well-known nutritional standards from the World Health Organization and the United Kingdom Food Standards Agency are used to calculate the healthiness score of the meal. (2) How can the model training be guided in a health-oriented manner? We construct category-wise personalization partial rankings and category-wise healthiness partial rankings, and theoretically analyze that they meet the necessary properties and assumptions required to be trained by the maximum posterior estimator under Bayesian probability. The data analysis confirms the existence of user preferences leaning towards less healthy meals in two public datasets. A comprehensive experiment demonstrates that our CateRec effectively boosts healthiness exposure in terms of mean healthiness score and ranking exposure while being comparable to the state-of-the-art model in terms of recommendation accuracy.

IJCAI Conference 2024 Conference Paper

Efficient and Stable Offline-to-online Reinforcement Learning via Continual Policy Revitalization

  • Rui Kong
  • Chenyang Wu
  • Chen-Xiao Gao
  • Zongzhang Zhang
  • Ming Li

In offline Reinforcement Learning (RL), the pre-trained policies are utilized for initialization and subsequent online fine-tuning. However, existing methods suffer from instability and low sample efficiency compared to pure online learning. This paper identifies these limitations stemming from direct policy initialization using offline-trained policy models. We propose Continual Policy Revitalization (CPR) as a novel efficient, stable fine-tuning method. CPR incorporates a periodic policy revitalization technique, restoring the overtrained policy network to full learning capacity while ensuring stable initial performance. This approach enables fine-tuning without being adversely affected by low-quality pre-trained policies. In contrast to previous research, CPR initializes the new policy with an adaptive policy constraint in policy optimization. Such optimization keeps the new policy close to behavior policy constructed from historical policies. This contributes to stable policy improvement and optimal converged performance. Practically, CPR can seamlessly integrate into existing offline RL algorithms with minimal modification. We empirically validate the effectiveness of our method through extensive experiments, demonstrating substantial improvements in learning stability and efficiency compared to previous approaches. Our code is available at https: //github. com/LAMDA-RL/CPR.

NeurIPS Conference 2024 Conference Paper

HC-GAE: The Hierarchical Cluster-based Graph Auto-Encoder for Graph Representation Learning

  • Lu Bai
  • Zhuo Xu
  • Lixin Cui
  • Ming Li
  • Yue Wang
  • Edwin Hancock

Graph Auto-Encoders (GAEs) are powerful tools for graph representation learning. In this paper, we develop a novel Hierarchical Cluster-based GAE (HC-GAE), that can learn effective structural characteristics for graph data analysis. To this end, during the encoding process, we commence by utilizing the hard node assignment to decompose a sample graph into a family of separated subgraphs. We compress each subgraph into a coarsened node, transforming the original graph into a coarsened graph. On the other hand, during the decoding process, we adopt the soft node assignment to reconstruct the original graph structure by expanding the coarsened nodes. By hierarchically performing the above compressing procedure during the decoding process as well as the expanding procedure during the decoding process, the proposed HC-GAE can effectively extract bidirectionally hierarchical structural features of the original sample graph. Furthermore, we re-design the loss function that can integrate the information from either the encoder or the decoder. Since the associated graph convolution operation of the proposed HC-GAE is restricted in each individual separated subgraph and cannot propagate the node information between different subgraphs, the proposed HC-GAE can significantly reduce the over-smoothing problem arising in the classical convolution-based GAEs. The proposed HC-GAE can generate effective representations for either node classification or graph classification, and the experiments demonstrate the effectiveness on real-world datasets.

NeurIPS Conference 2024 Conference Paper

Long-range Brain Graph Transformer

  • Shuo Yu
  • Shan Jin
  • Ming Li
  • Tabinda Sarwar
  • Feng Xia

Understanding communication and information processing among brain regions of interest (ROIs) is highly dependent on long-range connectivity, which plays a crucial role in facilitating diverse functional neural integration across the entire brain. However, previous studies generally focused on the short-range dependencies within brain networks while neglecting the long-range dependencies, limiting an integrated understanding of brain-wide communication. To address this limitation, we propose Adaptive Long-range aware TransformER (ALTER), a brain graph transformer to capture long-range dependencies between brain ROIs utilizing biased random walk. Specifically, we present a novel long-range aware strategy to explicitly capture long-range dependencies between brain ROIs. By guiding the walker towards the next hop with higher correlation value, our strategy simulates the real-world brain-wide communication. Furthermore, by employing the transformer framework, ALERT adaptively integrates both short- and long-range dependencies between brain ROIs, enabling an integrated understanding of multi-level communication across the entire brain. Extensive experiments on ABIDE and ADNI datasets demonstrate that ALTER consistently outperforms generalized state-of-the-art graph learning methods (including SAN, Graphormer, GraphTrans, and LRGNN) and other graph learning based brain network analysis methods (including FBNETGEN, BrainNetGNN, BrainGNN, and BrainNETTF) in neurological disease diagnosis.

EAAI Journal 2024 Journal Article

Multi-scale multi-instance contrastive learning for whole slide image classification

  • Jianan Zhang
  • Fang Hao
  • Xueyu Liu
  • Shupei Yao
  • Yongfei Wu
  • Ming Li
  • Wen Zheng

Multi-instance learning (MIL) has become the mainstream solution for processing super-high resolution whole slide images (WSIs) with the pyramidal structure in digital pathology. Current MIL-based methods usually learn features from WSI at a specific magnification, ignoring the multi-scale information contained in the WSI and the comparative learning of global features. In addition, the lack of instance labeling can lead to weak model supervision, which may compromise the model’s ability to discriminate fine-grained features, ultimately affecting bag-level feature learning. Therefore, we propose a novel multi-scale multi-instance contrastive learning framework to learn more discriminative feature representation across scales for pathological WSI classification. The proposed method begins with a two-stream feature aggregator module, which extracts both bag embeddings and selects the representative instances simultaneously. Following the bag embedding branch, a multi-scale contrastive learning module is designed to learn the global feature comparisons of WSIs across multiple scales by leveraging its inherent pyramid structure. Additionally, based on the instances selection branch, a patch-level classifier is combined with the bag-level classifier to jointly optimize the model training process, enhancing the supervision of the model. The proposed framework is evaluated on three publicly available WSI datasets, achieving an area under the curve of 95. 8%, 95. 5%, and 88. 2%, respectively, consistently outperforming all the compared methods including single- and multi-scale ones.

JBHI Journal 2024 Journal Article

Semi-Supervised Disease Classification Based on Limited Medical Image Data

  • Yan Zhang
  • Chun Li
  • Zhaoxia Liu
  • Ming Li

Inrecent years, significant progress has been made in the field of learning from positive and unlabeled examples (PU learning), particularly in the context of advancing image and text classification tasks. However, applying PU learning to semi-supervised disease classification remains a formidable challenge, primarily due to the limited availability of labeled medical images. In the realm of medical image-aided diagnosis algorithms, numerous theoretical and practical obstacles persist. The research on PU learning for medical image-assisted diagnosis holds substantial importance, as it aims to reduce the time spent by professional experts in classifying images. Unlike natural images, medical images are typically accompanied by a scarcity of annotated data, while an abundance of unlabeled cases exists. Addressing these challenges, this paper introduces a novel generative model inspired by Hölder divergence, specifically designed for semi-supervised disease classification using positive and unlabeled medical image data. In this paper, we present a comprehensive formulation of the problem and establish its theoretical feasibility through rigorous mathematical analysis. To evaluate the effectiveness of our proposed approach, we conduct extensive experiments on five benchmark datasets commonly used in PU medical learning: BreastMNIST, PneumoniaMNIST, BloodMNIST, OCTMNIST, and AMD. The experimental results clearly demonstrate the superiority of our method over existing approaches based on KL divergence. Notably, our approach achieves state-of-the-art performance on all five disease classification benchmarks. By addressing the limitations imposed by limited labeled data and harnessing the untapped potential of unlabeled medical images, our novel generative model presents a promising direction for enhancing semi-supervised disease classification in the field of medical image analysis.

EAAI Journal 2023 Journal Article

Ada-CCFNet: Classification of multimodal direct immunofluorescence images for membranous nephropathy via adaptive weighted confidence calibration fusion network

  • Ruili Wang
  • Xueyu Liu
  • Fang Hao
  • Xing Chen
  • Xinyu Li
  • Chen Wang
  • Dan Niu
  • Ming Li

In the pathological diagnosis of early, late and non-membranous nephropathy, direct immunofluorescence is highly likely to present potentially specific lesions, while it is often overlooked due to the difficulty of screening with naked eyes. With the advanced progress of deep learning, they have shown powerful abilities in detecting potential lesions. In this paper, we propose an adaptive weighted confidence calibration fusion framework (Ada-CCFNet) consisting of a preprocessing module, an adaptive weighted confidence calibration fusion (Ada-CCF) module and a classification module for diagnosis of membranous nephropathy by classifying the multimodal direct immunofluorescence images. In the preprocessing module, we use the well-known U-Net to segment individual glomeruli and standardize their luminance appearance by the average luminance difference method, allowing the subsequent modules to focus more on the diseased glomerular region. Subsequently, in the Ada-CCF module, six confidence calibration methods are utilized for two main direct immunofluorescence images, IgG and C3, and the comprehensive calibration scores are obtained based on the adaptive weighted fusion of six confidence calibration methods to obtain more reliable confidence level, in which the adaptive weights are related with expected calibration error reductions. For the classification module, the weighted probability scores of IgG and C3 are jointly fed into the module to achieve the classification by random forest. Experimental results showed that Ada-CCFNet achieves the classification accuracy of 73. 52%, surpassing the methods of using single IgG or C3 images and positive grade indicator with 8. 24%, 8. 94% and 22. 76%, and outperforming the compared methods in the classification of membranous nephropathy.

IJCAI Conference 2023 Conference Paper

Capturing the Long-Distance Dependency in the Control Flow Graph via Structural-Guided Attention for Bug Localization

  • Yi-Fan Ma
  • Yali Du
  • Ming Li

To alleviate the burden of software maintenance, bug localization, which aims to automatically locate the buggy source files based on the bug report, has drawn significant attention in the software mining community. Recent studies indicate that the program structure in source code carries more semantics reflecting the program behavior, which is beneficial for bug localization. Benefiting from the rich structural information in the Control Flow Graph (CFG), CFG-based bug localization methods have achieved the state-of-the-art performance. Existing CFG-based methods extract the semantic feature from the CFG via the graph neural network. However, the step-wise feature propagation in the graph neural network suffers from the problem of information loss when the propagation distance is long, while the long-distance dependency is rather common in the CFG. In this paper, we argue that the long-distance dependency is crucial for feature extraction from the CFG, and propose a novel bug localization model named sgAttention. In sgAttention, a particularly designed structural-guided attention is employed to globally capture the information in the CFG, where features of irrelevant nodes are masked for each node to facilitate better feature extraction from the CFG. Experimental results on four widely-used open-source software projects indicate that sgAttention averagely improves the state-of-the-art bug localization methods by 32. 9\% and 29. 2\% and the state-of-the-art pre-trained models by 5. 8\% and 4. 9\% in terms of MAP and MRR, respectively.

AAAI Conference 2023 Conference Paper

Cooperative and Adversarial Learning: Co-enhancing Discriminability and Transferability in Domain Adaptation

  • Hui Sun
  • Zheng Xie
  • Xin-Ye Li
  • Ming Li

Discriminability and transferability are two goals of feature learning for domain adaptation (DA), as we aim to find the transferable features from the source domain that are helpful for discriminating the class label in the target domain. Modern DA approaches optimize discriminability and transferability by adopting two separate modules for the two goals upon a feature extractor, but lack fully exploiting their relationship. This paper argues that by letting the discriminative module and transfer module help each other, better DA can be achieved. We propose Cooperative and Adversarial LEarning (CALE) to combine the optimization of discriminability and transferability into a whole, provide one solution for making the discriminative module and transfer module guide each other. Specifically, CALE generates cooperative (easy) examples and adversarial (hard) examples with both discriminative module and transfer module. While the easy examples that contain the module knowledge can be used to enhance each other, the hard ones are used to enhance the robustness of the corresponding goal. Experimental results show the effectiveness of CALE for unifying the learning of discriminability and transferability, as well as its superior performance.

AAAI Conference 2023 Conference Paper

Semi-supervised Learning with Support Isolation by Small-Paced Self-Training

  • Zheng Xie
  • Hui Sun
  • Ming Li

In this paper, we address a special scenario of semi-supervised learning, where the label missing is caused by a preceding filtering mechanism, i.e., an instance can enter a subsequent process in which its label is revealed if and only if it passes the filtering mechanism. The rejected instances are prohibited to enter the subsequent labeling process due to economical or ethical reasons, making the support of the labeled and unlabeled distributions isolated from each other. In this case, semi-supervised learning approaches which rely on certain coherence of the labeled and unlabeled distribution would suffer from the consequent distribution mismatch, and hence result in poor prediction performance. In this paper, we propose a Small-Paced Self-Training framework, which iteratively discovers labeled and unlabeled instance subspaces with bounded Wasserstein distance. We theoretically prove that such a framework may achieve provably low error on the pseudo labels during learning. Experiments on both benchmark and pneumonia diagnosis tasks show that our method is effective.

IJCAI Conference 2023 Conference Paper

Stability and Generalization of lp-Regularized Stochastic Learning for GCN

  • Shiyu Liu
  • Linsen Wei
  • Shaogao Lv
  • Ming Li

Graph convolutional networks (GCN) are viewed as one of the most popular representations among the variants of graph neural networks over graph data and have shown powerful performance in empirical experiments. That l2-based graph smoothing enforces the global smoothness of GCN, while (soft) l1-based sparse graph learning tends to promote signal sparsity to trade for discontinuity. This paper aims to quantify the trade-off of GCN between smoothness and sparsity, with the help of a general lp-regularized (1<p<= 2) stochastic learning proposed within. While stability-based generalization analyses have been given in prior work for a second derivative objectiveness function, our lp-regularized learning scheme does not satisfy such a smooth condition. To tackle this issue, we propose a novel SGD proximal algorithm for GCNs with an inexact operator. For a single-layer GCN, we establish an explicit theoretical understanding of GCN with the lp-regularized stochastic learning by analyzing the stability of our SGD proximal algorithm. We conduct multiple empirical experiments to validate our theoretical findings.

EAAI Journal 2023 Journal Article

Switching synthesizing-incorporated and cluster-based synthetic oversampling for imbalanced binary classification

  • Jun Dou
  • Zihan Gao
  • Guoliang Wei
  • Yan Song
  • Ming Li

Oversampling is a popular yet useful method to fulfill the binary classification of imbalanced data, however many existing results of oversampling are very likely to generate redundant/unsafe/noise samples due primarily to the inadequate consideration of the data distribution. To address this issue, we propose a novel oversampling approach, namely Switching Synthesizing-Incorporated and Cluster-Based Synthetic Oversampling (SSI-CBSO). The core idea of SSI-CBSO is four-fold: (1) noise samples are removed by using K nearest neighbor strategy and Fuzzy C-Means clustering is adopted for the filtered data in the minority class; (2) the number of samples that need to be synthesized is adaptively assigned to each cluster concerning the inter-class distance and the intra-cluster similarity; (3) to better reflect the data distribution, a new method in terms of the concept of the hypersphere is put forward to measure the cluster density in a high dimensional; and (4) a new principle based on the Mahalanobis distance is provided for a better selection of the target sample. Then, a switching synthesizing strategy is established to guarantee the safety of the synthesized samples. Finally, experiments on 13 binary imbalanced data sets by using five evaluation metrics with four classifiers verify that our proposed SSI-CBSO approach can obtain desirable results.

EAAI Journal 2022 Journal Article

A dynamical artificial bee colony for vehicle routing problem with drones

  • Deming Lei
  • Zhengzhi Cui
  • Ming Li

Truck-drone hybrid delivery is a hybrid one combining the advantages including large capacity of truck and high travel speed of drone together. Vehicle routing problem with drones (VRP-D) is a common one in the above delivery system. In this paper, VRP-D is addressed and a new dynamical artificial bee colony (DABC) is employed to minimize the overall operational cost. Two bee swarms are produced and an effective evaluation process is used to determine employed bee swarm and onlooker bee swarm dynamically. Variable neighborhood descent is constructed by using 15 neighborhood structures and adopted in employed bee phase and onlooker bee phase in different ways. A number of experiments are conducted on 112 instances and the computational results reveal that DABC provides new best solutions for 37 instances and has promising advantages on VRP-D.

NeurIPS Conference 2022 Conference Paper

Few-Shot Non-Parametric Learning with Deep Latent Variable Model

  • Zhiying Jiang
  • Yiqin Dai
  • Ji Xin
  • Ming Li
  • Jimmy Lin

Most real-world problems that machine learning algorithms are expected to solve face the situation with (1) unknown data distribution; (2) little domain-specific knowledge; and (3) datasets with limited annotation. We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV), a learning framework for any dataset with abundant unlabeled data but very few labeled ones. By only training a generative model in an unsupervised way, the framework utilizes the data distribution to build a compressor. Using a compressor-based distance metric derived from Kolmogorov complexity, together with few labeled data, NPC-LV classifies without further training. We show that NPC-LV outperforms supervised methods on all three datasets on image classification in the low data regime and even outperforms semi-supervised learning methods on CIFAR-10. We demonstrate how and when negative evidence lowerbound (nELBO) can be used as an approximate compressed length for classification. By revealing the correlation between compression rate and classification accuracy, we illustrate that under NPC-LV how the improvement of generative models can enhance downstream classification accuracy.

AIJ Journal 2022 Journal Article

Multi-view graph convolutional networks with attention mechanism

  • Kaixuan Yao
  • Jiye Liang
  • Jianqing Liang
  • Ming Li
  • Feilong Cao

Recent advances in graph convolutional networks (GCNs), which mainly focus on how to exploit information from different hops of neighbors in an efficient way, have brought substantial improvement to many graph data modeling tasks. Most of the existing GCN-based models however are built on the basis of a fixed adjacency matrix, i. e. , a single view topology of the underlying graph. That inherently limits the expressive power of the developed models especially when the raw graphs are often noisy or even incomplete due to the inevitably error-prone data measurement or collection. In this paper, we propose a novel framework, termed Multi-View Graph Convolutional Networks with Attention Mechanism (MAGCN), by incorporating multiple views of topology and an attention-based feature aggregation strategy into the computation of graph convolution. As an advanced variant of GCNs, MAGCN is fed with multiple “trustable” topologies, which already exist for a given task or are empirically generated by some classical graph construction methods, which has good potential to produce a better learning representation for downstream tasks. Furthermore, we present some theoretical analysis about the expressive power and flexibility of MAGCN, which provides a general explanation as to why multi-view based methods can potentially outperform those relying on a single view. Our experimental study demonstrates the state-of-the-art accuracies of MAGCN on Cora, Citeseer, and Pubmed datasets. Robustness analysis is also undertaken to show the advantage of MAGCN in handling some uncertainty issues in node classification tasks.

NeurIPS Conference 2022 Conference Paper

Pyramid Attention For Source Code Summarization

  • Lei Chai
  • Ming Li

This paper presents a multi-granularity method for source code summarization, which generates a concise functional description for the given code snippet. We notice that skilled programmers write and read source codes hierarchically and pay close attention to conceptual entities like statements, tokens, sub-tokens, and the mapping relations between them. The entities have specific emphasis according to their granularities, e. g. , statements in coarse-granularity reveal the global logical semantics of code, and the sub-tokens in fine-granularity are more related to the textual semantics. Driven by this observation, we demonstrate that a multi-granularity formulation incorporating these conceptual entities benefit the code summarization task. Concretely, the source code is transformed into a pyramidal representation, and then a pyramid attention mechanism is applied for efficient feature aggregation among different hierarchies in it. We instantiate our multi-granularity method using the proposed pyramid attention and name it PA-former (Pyramid Attention transformer). We evaluated it on two source code summarization benchmarks where it surpasses the prior works and achieves new state-of-the-art results. Our code and data are available at https: //github. com/leichainju/pa-former.

EAAI Journal 2022 Journal Article

The interval grey QFD method for new product development: Integrate with LDA topic model to analyze online reviews

  • Shengqing Huang
  • Jie Zhang
  • Chaoxiang Yang
  • Quan Gu
  • Ming Li
  • Wenqiang Wang

As the development of the consuming environment, the way consumers give feedback on product experience changes from passive feedback to active reviews, and the development of artificial intelligence technology brings new possibilities for companies to obtain information on Customer Requirements (CRs) and Market Competition Information (CIs) needed for new product development. In the previous New Product Development (NPD) process, CRs and CIs often need to be corrected through market surveys and questionnaires. To acquire the corrected data from experienced experts without user and designer cognitive bias is very laborious and more subjective. Under the situation of key information shortage for business developers and NPD efficiency & accuracy cannot be guaranteed, online reviews provided a good source for data analysis to enhance the market competition. This study integrates Latent Dirichlet Allocation (LDA), Apriori algorithm, interval grey number, and Quality Function Deployment(QFD) techniques to propose the Latent Dirichlet Allocation-Interval Grey Number Quality Function Deployment (LDA-IGQFD) method, which can use text mining and analysis methods to obtain objective information about CRs and CIs contained in users’ online reviews, transformed them into data and input into Interval Grey Number Quality Function Deployment (IGQFD) to drive product development. LDA-IGQFD can help product developers identify CRs and Engineering Characteristics (ECs) important information and provide suggestions for product development. The aim of the research process under the LDA-IGQFD method is explained with the design of a dishwasher as an example, and the effectiveness and practicality of the proposed method is scientifically verified.

NeurIPS Conference 2022 Conference Paper

When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning

  • Haoyi Niu
  • Shubham Sharma
  • Yiwen Qiu
  • Ming Li
  • Guyue Zhou
  • Jianming Hu
  • Xianyuan Zhan

Learning effective reinforcement learning (RL) policies to solve real-world complex tasks can be quite challenging without a high-fidelity simulation environment. In most cases, we are only given imperfect simulators with simplified dynamics, which inevitably lead to severe sim-to-real gaps in RL policy learning. The recently emerged field of offline RL provides another possibility to learn policies directly from pre-collected historical data. However, to achieve reasonable performance, existing offline RL algorithms need impractically large offline data with sufficient state-action space coverage for training. This brings up a new question: is it possible to combine learning from limited real data in offline RL and unrestricted exploration through imperfect simulators in online RL to address the drawbacks of both approaches? In this study, we propose the Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning (H2O) framework to provide an affirmative answer to this question. H2O introduces a dynamics-aware policy evaluation scheme, which adaptively penalizes the Q function learning on simulated state-action pairs with large dynamics gaps, while also simultaneously allowing learning from a fixed real-world dataset. Through extensive simulation and real-world tasks, as well as theoretical analysis, we demonstrate the superior performance of H2O against other cross-domain online and offline RL algorithms. H2O provides a brand new hybrid offline-and-online RL paradigm, which can potentially shed light on future RL algorithm design for solving practical real-world tasks.

JBHI Journal 2021 Journal Article

3D Context-Aware Convolutional Neural Network for False Positive Reduction in Clustered Microcalcifications Detection

  • Jian Zheng
  • Haotian Sun
  • Shandong Wu
  • Ke Jiang
  • Yunsong Peng
  • Xiaodong Yang
  • Fan Zhang
  • Ming Li

False positives (FPs) reduction is indispensable for clustered microcalcifications (MCs) detection in digital breast tomosynthesis (DBT), since there might be excessive false candidates in the detection stage. Considering that DBT volume has an anisotropic resolution, we proposed a novel 3D context-aware convolutional neural network (CNN) to reduce FPs, which consists of a 2D intra-slices feature extraction branch and a 3D inter-slice features fusion branch. In particular, 3D anisotropic convolutions were designed to learn representations from DBT volumes and inter-slice information fusion is only performed on the feature map level, which could avoid the influence of anisotropic resolution of DBT volume. The proposed method was evaluated on a large-scale Chinese women population of 877 cases with 1754 DBT volumes and compared with 8 related methods. Experimental results show that the proposed network achieved the best performance with an accuracy of 92. 68% for FPs reduction with an AUC of 97. 65%, and the FPs are 0. 0512 per DBT volume at a sensitivity of 90%. This also proved that making full use of 3D contextual information of DBT volume can improve the performance of the classification algorithm.

EAAI Journal 2021 Journal Article

An imperialist competitive algorithm with feedback for energy-efficient flexible job shop scheduling with transportation and sequence-dependent setup times

  • Ming Li
  • Deming Lei

Flexible job shop scheduling problems have been extensively investigated in the past decade; however, transportation, sequence-dependent setup times (SDST) and energy efficiency are seldom incorporated together in flexible job shop. In this paper, energy-efficient flexible job shop scheduling problem (EFJSP) with transportation and SDST is considered and an imperialist competitive algorithm with feedback (FICA) is developed to minimize makespan, total tardiness and total energy consumption simultaneously. Assimilation and adaptive revolution are newly implemented by feedback and a new imperialist competition is presented by solution transferring among empires and the reinforced search. Extensive experiments are conducted and the computational results demonstrate that FICA provides promising results for EFJSP with transportation and SDST.

AAAI Conference 2021 Short Paper

Enhancing Context-Based Meta-Reinforcement Learning Algorithms via An Efficient Task Encoder (Student Abstract)

  • Feng Xu
  • Shengyi Jiang
  • Hao Yin
  • Zongzhang Zhang
  • Yang Yu
  • Ming Li
  • Dong Li
  • Wulong Liu

Meta-Reinforcement Learning (meta-RL) algorithms enable agents to adapt to new tasks from small amounts of exploration, based on the experience of similar tasks. Recent studies have pointed out that a good representation of a task is key to the success of off-policy context-based meta-RL. Inspired by contrastive methods in unsupervised representation learning, we propose a new method to learn the task representation based on the mutual information between transition tuples in a trajectory and the task embedding. We also propose a new estimation for task similarity based on Q-function, which can be used to form a constraint on the distribution of the encoded task variables, making the task encoder encode the task variables more effective on new tasks. Experiments on meta-RL tasks show that the newly proposed method outperforms existing meta-RL algorithms.

AAAI Conference 2021 Conference Paper

Segatron: Segment-Aware Transformer for Language Modeling and Understanding

  • He Bai
  • Peng Shi
  • Jimmy Lin
  • Yuqing Xie
  • Luchen Tan
  • Kun Xiong
  • Wen Gao
  • Ming Li

Transformers are powerful for sequence modeling. Nearly all state-of-the-art language models and pre-trained language models are based on the Transformer architecture. However, it distinguishes sequential tokens only with the token position index. We hypothesize that better contextual representations can be generated from the Transformer with richer positional information. To verify this, we propose a segmentaware Transformer (Segatron), by replacing the original token position encoding with a combined position encoding of paragraph, sentence, and token. We first introduce the segment-aware mechanism to Transformer-XL, which is a popular Transformer-based language model with memory extension and relative position encoding. We find that our method can further improve the Transformer-XL base model and large model, achieving 17. 1 perplexity on the WikiText- 103 dataset. We further investigate the pre-training masked language modeling task with Segatron. Experimental results show that BERT pre-trained with Segatron (SegaBERT) can outperform BERT with vanilla Transformer on various NLP tasks, and outperforms RoBERTa on zero-shot sentence representation learning. Our code is available on GitHub. 1

IJCAI Conference 2021 Conference Paper

Towards Generating Summaries for Lexically Confusing Code through Code Erosion

  • Fan Yan
  • Ming Li

Code summarization aims to summarize code functionality as high-level nature language descriptions to assist in code comprehension. Recent approaches in this field mainly focus on generating summaries for code with precise identifier names, in which meaningful words can be found indicating code functionality. When faced with lexically confusing code, current approaches are likely to fail since the correlation between code lexical tokens and summaries is scarce. To tackle this problem, we propose a novel summarization framework named VECOS. VECOS introduces an erosion mechanism to conquer the model's reliance on precisely defined lexical information. To facilitate learning the eroded code's functionality, we force the representation of the eroded code to align with the representation of its original counterpart via variational inference. Experimental results show that our approach outperforms the state-of-the-art approaches to generate coherent and reliable summaries for various lexically confusing code.

YNICL Journal 2020 Journal Article

Altered gray matter volumes in post-stroke depressive patients after subcortical stroke

  • Wenjun Hong
  • Zhiyong Zhao
  • Dongmei Wang
  • Ming Li
  • Chaozheng Tang
  • Zheng Li
  • Rong Xu
  • Chetwyn C.H. Chan

Stroke survivors are known to suffer from post-stroke depression (PSD). However, the likelihood of structural changes in the brains of PSD patients has not been explored. This study aims to extract changes in the gray matter of these patients and test how these changes account for the PSD symptoms. High-resolution T1 weighted images were collected from 23 PSD patients diagnosed with subcortical stroke. Voxel-based morphometry and support vector machine analyses were used to analyze the data. The results were compared with those collected from 33 non-PSD patients. PSD group showed decreased gray matter volume (GMV) in the left middle frontal gyrus (MFG) when compared to the non-PSD patients. Together with the clinical and demographic variables, the MFG's GMV predictive model was able to distinguish PSD from the non-PSD patients (0•70 sensitivity and 0•88 specificity). The changes in the left inferior frontal gyrus (61%) and dorsolateral prefrontal cortex (39%) suggest that the somatic/affective symptoms in PSD is likely to be due to patients' problems with understanding and appraising negative emotional stimuli. The impact brought by the reduced prefrontal to limbic system connectivity needs further exploration. These findings indicate possible systemic involvement of the frontolimbic network resulting in PSD after brain lesions which is likely to be independent from the location of the lesion. The results inform specific clinical interventions to be provided for treating depressive symptoms in post-stroke patients.

IROS Conference 2020 Conference Paper

An Obstacle-crossing Strategy Based on the Fast Self-reconfiguration for Modular Sphere Robots

  • Haobo Luo
  • Ming Li
  • Guangqi Liang
  • Huihuan Qian
  • Tin Lun Lam

This paper introduces an obstacle-crossing strategy, and the self-reconfiguration algorithm for a new class of modular robots called the rolling sphere, which can fit obstacles represented by cubes of different sizes due to the chain connection of multiple spheres. For the self-reconfiguration of the rolling spheres, a large gradient is obtained by classifying its action types and hierarchically minimizing the distance between the initial configuration and the final configuration. The most direct use of this large gradient is the fast crossing of various obstacles, by jointing multiple self-reconfigurations according to the OctoMap of the obstacles. It is verified in simulation that the self-reconfiguration takes full advantage of the parallel movement of multiple modules to reduce the total time steps, and the obstacle-crossing strategy can adapt to a variety of obstacles.

AAAI Conference 2020 Conference Paper

Control Flow Graph Embedding Based on Multi-Instance Decomposition for Bug Localization

  • Xuan Huo
  • Ming Li
  • Zhi-Hua Zhou

During software maintenance, bug report is an effective way to identify potential bugs hidden in a software system. It is a great challenge to automatically locate the potential buggy source code according to a bug report. Traditional approaches usually represent bug reports and source code from a lexical perspective to measure their similarities. Recently, some deep learning models are proposed to learn the unified features by exploiting the local and sequential nature, which overcomes the difficulty in modeling the difference between natural and programming languages. However, only considering local and sequential information from one dimension is not enough to represent the semantics, some multi-dimension information such as structural and functional nature that carries additional semantics has not been well-captured. Such information beyond the lexical and structural terms is extremely vital in modeling program functionalities and behaviors, leading to a better representation for identifying buggy source code. In this paper, we propose a novel model named CG-CNN, which is a multi-instance learning framework that enhances the uni- fied features for bug localization by exploiting structural and sequential nature from the control flow graph. Experimental results on widely-used software projects demonstrate the effectiveness of our proposed CG-CNN model.

AAAI Conference 2020 Conference Paper

Deep Time-Stream Framework for Click-through Rate Prediction by Tracking Interest Evolution

  • Shu-Ting Shi
  • Wenhao Zheng
  • Jun Tang
  • Qing-Guo Chen
  • Yao Hu
  • Jianke Zhu
  • Ming Li

Click-through rate (CTR) prediction is an essential task in industrial applications such as video recommendation. Recently, deep learning models have been proposed to learn the representation of users’ overall interests, while ignoring the fact that interests may dynamically change over time. We argue that it is necessary to consider the continuous-time information in CTR models to track user interest trend from rich historical behaviors. In this paper, we propose a novel Deep Time-Stream framework (DTS) which introduces the time information by an ordinary differential equations (ODE). DTS continuously models the evolution of interests using a neural network, and thus is able to tackle the challenge of dynamically representing users’ interests based on their historical behaviors. In addition, our framework can be seamlessly applied to any existing deep CTR models by leveraging the additional Time-Stream Module, while no changes are made to the original CTR models. Experiments on public dataset as well as real industry dataset with billions of samples demonstrate the effectiveness of proposed approaches, which achieve superior performance compared with existing methods.

IROS Conference 2020 Conference Paper

FreeBOT: A Freeform Modular Self-reconfigurable Robot with Arbitrary Connection Point - Design and Implementation

  • Guanqi Liang
  • Haobo Luo
  • Ming Li
  • Huihuan Qian
  • Tin Lun Lam

This paper proposes a novel modular selfreconfigurable robot (MSRR) "FreeBOT", which can be connected freely at any point on other robots. FreeBOT is mainly composed of two parts: a spherical ferromagnetic shell and an internal magnet. The connection between the modules is genderless and instant, since the internal magnet can freely attract other FreeBOT spherical ferromagnetic shells, and not need to be precisely aligned with the specified connector. This connection method has fewer physical constraints, so the FreeBOT system can be extended to more configurations to meet more functional requirements. FreeBOT can accomplish multiple tasks although it only has two motors: module independent movement, connector management and system reconfiguration. FreeBOT can move independently on the plane, and even climb on ferromagnetic walls; a group of FreeBOTs can traverse complex terrain. Numerous experiments have been conducted to test its function, which shows that the FreeBOT system has great potential to realize a freeform robotic system.

NeurIPS Conference 2020 Conference Paper

Path Integral Based Convolution and Pooling for Graph Neural Networks

  • Zheng Ma
  • Junyu Xuan
  • Yu Guang Wang
  • Ming Li
  • Pietro Liò

Graph neural networks (GNNs) extends the functionality of traditional neural networks to graph-structured data. Similar to CNNs, an optimized design of graph convolution and pooling is key to success. Borrowing ideas from physics, we propose a path integral based graph neural networks (PAN) for classification and regression tasks on graphs. Specifically, we consider a convolution operation that involves every path linking the message sender and receiver with learnable weights depending on the path length, which corresponds to the maximal entropy random walk. It generalizes the graph Laplacian to a new transition matrix we call \emph{maximal entropy transition} (MET) matrix derived from a path integral formalism. Importantly, the diagonal entries of the MET matrix are directly related to the subgraph centrality, thus lead to a natural and adaptive pooling mechanism. PAN provides a versatile framework that can be tailored for different graph data with varying sizes and structures. We can view most existing GNN architectures as special cases of PAN. Experimental results show that PAN achieves state-of-the-art performance on various graph classification/regression tasks, including a new benchmark dataset from statistical mechanics we propose to boost applications of GNN in physical sciences.

IROS Conference 2020 Conference Paper

Robot-to-Robot Relative Pose Estimation based on Semidefinite Relaxation Optimization

  • Ming Li
  • Guanqi Liang
  • Haobo Luo
  • Huihuan Qian
  • Tin Lun Lam

In this paper, the 2D robot-to-robot relative pose (position and orientation) estimation problem based on ego-motion and noisy distance measurements is considered. We address this problem using an optimization-based method, which does not require complicated numerical analysis while yields no inferior relative localization (RL) results compared to existing approaches. In particular, we start from a state-of-the-art method named square distances weighted least square (SD-WLS), and reformulate it as a non-convex quadratically constrained quadratic programming (QCQP) problem. To handle its non-convex nature, a semidefinite programming (SDP) relaxation optimization-based method is proposed, and we prove that the relaxation is tight when measurements are free from noise or just corrupted by small noise. Further, to obtain the optimal solution of the relative pose estimation problem in the sense of maximum likelihood estimation (MLE), a theoretically optimal WLS method is developed to refine the estimate from the SDP optimization. Comprehensive simulations and well-designed experiments are presented for validating the tightness of the SDP relaxation, and the effectiveness of the proposed algorithm is highlighted by comparing it to the existing approaches.

AAAI Conference 2019 Conference Paper

Automatic Code Review by Learning the Revision of Source Code

  • Shu-Ting Shi
  • Ming Li
  • David Lo
  • Ferdian Thung
  • Xuan Huo

Code review is the process of manual inspection on the revision of the source code in order to find out whether the revised source code eventually meets the revision requirements. However, manual code review is time-consuming, and automating such the code review process will alleviate the burden of code reviewers and speed up the software maintenance process. To construct the model for automatic code review, the characteristics of the revisions of source code (i. e. , the difference between the two pieces of source code) should be properly captured and modeled. Unfortunately, most of the existing techniques can easily model the overall correlation between two pieces of source code, but not for the “difference” between two pieces of source code. In this paper, we propose a novel deep model named DACE for automatic code review. Such a model is able to learn revision features by contrasting the revised hunks from the original and revised source code with respect to the code context containing the hunks. Experimental results on six open source software projects indicate by learning the revision features, DACE can outperform the competing approaches in automatic code review.

TIST Journal 2019 Journal Article

Distributed Deep Forest and its Application to Automatic Detection of Cash-Out Fraud

  • Ya-Lin Zhang
  • Jun Zhou
  • Wenhao Zheng
  • Ji Feng
  • Longfei Li
  • Ziqi Liu
  • Ming Li
  • Zhiqiang Zhang

Internet companies are facing the need for handling large-scale machine learning applications on a daily basis and distributed implementation of machine learning algorithms which can handle extra-large-scale tasks with great performance is widely needed. Deep forest is a recently proposed deep learning framework which uses tree ensembles as its building blocks and it has achieved highly competitive results on various domains of tasks. However, it has not been tested on extremely large-scale tasks. In this work, based on our parameter server system, we developed the distributed version of deep forest. To meet the need for real-world tasks, many improvements are introduced to the original deep forest model, including MART (Multiple Additive Regression Tree) as base learners for efficiency and effectiveness consideration, the cost-based method for handling prevalent class-imbalanced data, MART based feature selection for high dimension data, and different evaluation metrics for automatically determining the cascade level. We tested the deep forest model on an extra-large-scale task, i.e., automatic detection of cash-out fraud, with more than 100 million training samples. Experimental results showed that the deep forest model has the best performance according to the evaluation metrics from different perspectives even with very little effort for parameter tuning. This model can block fraud transactions in a large amount of money each day. Even compared with the best-deployed model, the deep forest model can additionally bring a significant decrease in economic loss each day.

AAAI Conference 2019 Conference Paper

Find Me if You Can: Deep Software Clone Detection by Exploiting the Contest between the Plagiarist and the Detector

  • Yan-Ya Zhang
  • Ming Li

Code clone is common in software development, which usually leads to software defects or copyright infringement. Researchers have paid significant attention to code clone detection, and many methods have been proposed. However, the patterns for generating the code clones do not always remain the same. In order to fool the clone detection systems, the plagiarists, known as the clone creator, usually conduct a series of tricky modifications on the code fragments to make the clone difficult to detect. The existing clone detection approaches, which neglects the dynamics of the “contest” between the plagiarist and the detectors, is doomed to be not robust to adversarial revision of the code. In this paper, we propose a novel clone detection approach, namely ACD, to mimic the adversarial process between the plagiarist and the detector, which enables us to not only build strong a clone detector but also model the behavior of the plagiarists. Such a plagiarist model may in turn help to understand the vulnerability of the current software clone detection tools. Experiments show that the learned policy of plagiarist can help us build stronger clone detector, which outperforms the existing clone detection methods.

AAAI Conference 2019 Conference Paper

Learning Uniform Semantic Features for Natural Language and Programming Language Globally, Locally and Sequentially

  • Yudong Zhang
  • Wenhao Zheng
  • Ming Li

Semantic feature learning for natural language and programming language is a preliminary step in addressing many software mining tasks. Many existing methods leverage information in lexicon and syntax to learn features for textual data. However, such information is inadequate to represent the entire semantics in either text sentence or code snippet. This motivates us to propose a new approach to learn semantic features for both languages, through extracting three levels of information, namely global, local and sequential information, from textual data. For tasks involving both modalities, we project the data of both types into a uniform feature space so that the complementary knowledge in between can be utilized in their representation. In this paper, we build a novel and general-purpose feature learning framework called UniEmbed, to uniformly learn comprehensive semantic representation for both natural language and programming language. Experimental results on three real-world software mining tasks show that UniEmbed outperforms state-of-the-art models in feature learning and prove the capacity and effectiveness of our model.

IJCAI Conference 2018 Conference Paper

Cutting the Software Building Efforts in Continuous Integration by Semi-Supervised Online AUC Optimization

  • Zheng Xie
  • Ming Li

Continuous Integration (CI) systems aim to provide quick feedback on the success of the code changes by keeping on building the entire systems upon code changes are committed. However, building the entire software system is usually resource and time consuming. Thus, build outcome prediction is usually employed to distinguish the successful builds from the failed ones to cut the building efforts on those successful builds that do not result in any immediate action of the developer. Nevertheless, build outcome prediction in CI is challenging since the learner should be able to learn from a stream of build events with and without the build outcome labels and provide immediate prediction on the next build event. Also, the distribution of the successful and the failed builds are often highly imbalanced. Unfortunately, the existing methods fail to address these challenges well. In this paper, we address these challenges by proposing a semi-supervised online AUC optimization method for CI build outcome prediction. Experiments indicate that our method is able to cut the software building efforts by effectively identify the successful builds, and it outperforms the existing methods that elaborate to address part of these challenges.

YNIMG Journal 2018 Journal Article

Detectability and reproducibility of the olfactory fMRI signal under the influence of magnetic susceptibility artifacts in the primary olfactory cortex

  • Jiaming Lu
  • Xin Wang
  • Zhao Qing
  • Zhu Li
  • Wen Zhang
  • Ying Liu
  • Lihua Yuan
  • Le Cheng

For human olfactory functional MRI studies, the primary olfactory cortex (POC) suffers severe magnetic susceptibility artifacts, which adversely influences the detectability and reproducibility of the olfactory fMRI data and its clinical applications. The goal of this work is to assess the impacts of the image artifacts on the detectability and reproducibility of the olfactory activation in the POC. The severity of artifacts in the POC were classified into three levels using a Subjective Artifact score (SA_score). The mean temporal signal-to-noise ratio (tSNR) of the fMRI data acquired by a given MRI sequence and olfactory activation (β value) in POC were evaluated and compared to the concurrent activations in the primary visual cortex (Brodmann area 17, BA17) by an odor-visual association paradigm using ninety-nine normal human subjects. Our study revealed that the mean tSNR in POC was above the threshold for reliable detection of the functional activation signal, and, consequently, the mean olfactory activations in the POC were not significantly different from those in BA17. The reproducibility of the activation in the POC was assessed by a random half-split stimulation of a test-retest experiment. The overlap of the activation maps for all the trials (n = 1000) in the POC were not statistically different from that observed in BA17. These results show that the detectability and reproducibility of olfactory activation in the presence of susceptibility artifacts in the POC was at similar level of that in the visual cortex.

IJCAI Conference 2018 Conference Paper

Generating Thematic Chinese Poetry using Conditional Variational Autoencoders with Hybrid Decoders

  • Xiaopeng Yang
  • Xiaowen Lin
  • Shunda Suo
  • Ming Li

Computer poetry generation is our first step towards computer writing. Writing must have a theme. The current approaches of using sequence-to-sequence models with attention often produce non-thematic poems. We present a novel conditional variational autoencoder with a hybrid decoder adding the deconvolutional neural networks to the general recurrent neural networks to fully learn topic information via latent variables. This approach significantly improves the relevance of the generated poems by representing each line of the poem not only in a context-sensitive manner but also in a holistic way that is highly related to the given keyword and the learned topic. A proposed augmented word2vec model further improves the rhythm and symmetry. Tests show that the generated poems by our approach are mostly satisfying with regulated rules and consistent themes, and 73. 42% of them receive an Overall score no less than 3 (the highest score is 5).

YNIMG Journal 2018 Journal Article

Impact of global signal regression on characterizing dynamic functional connectivity and brain states

  • Huaze Xu
  • Jianpo Su
  • Jian Qin
  • Ming Li
  • Ling-Li Zeng
  • Dewen Hu
  • Hui Shen

Recently, resting-state functional magnetic resonance imaging (fMRI) studies have been extended to explore fluctuations in correlations over shorter timescales, referred to as dynamic functional connectivity (dFC). However, the impact of global signal regression (GSR) on dFC is not well established, despite the intensive investigations of the influence of GSR on static functional connectivity (sFC). This study aimed to examine the effect of GSR on the performance of the sliding-window correlation, a commonly used method for capturing functional connectivity (FC) dynamics based on resting-state fMRI and simultaneous electroencephalograph (EEG)-fMRI data. The results revealed that the impact of GSR on dFC was spatially heterogeneous, with some susceptible regions including the occipital cortex, sensorimotor area, precuneus, posterior insula and superior temporal gyrus, and that the impact was temporally modulated by the mean global signal (GS) magnitude across windows. Furthermore, GSR substantially changed the connectivity structures of the FC states responding to a high GS magnitude, as well as their temporal features, and even led to the emergence of new FC states. Conversely, those FC states marked by obvious anti-correlation structures associated with the default model network (DMN) were largely unaffected by GSR. Finally, we reported an association between the fluctuations in the windowed magnitude of GS and the time-varying EEG power within subjects, which implied changes in mental states underlying GS dynamics. Overall, this study suggested a potential neuropsychological basis, in addition to nuisance sources, for GS dynamics and highlighted the need for caution in applying GSR to sliding-window correlation analyses. At a minimum, the mental fluctuations of an individual subject, possibly related to ongoing vigilance, should be evaluated during the entire scan when the dynamics of FC is estimated.

IJCAI Conference 2018 Conference Paper

Positive and Unlabeled Learning for Detecting Software Functional Clones with Adversarial Training

  • Hui-Hui Wei
  • Ming Li

Software clone detection is an important problem for software maintenance and evolution and it has attracted lots of attentions. However, existing approaches ignore a fact that people would label the pairs of code fragments as \emph{clone} only if they happen to discover the clones while a huge number of undiscovered clone pairs and non-clone pairs are left unlabeled. In this paper, we argue that the clone detection task in the real-world should be formalized as a Positive-Unlabeled (PU) learning problem, and address this problem by proposing a novel positive and unlabeled learning approach, namely CDPU, to effectively detect software functional clones, i. e. , pieces of codes with similar functionality but differing in both syntactical and lexical level, where adversarial training is employed to improve the robustness of the learned model to those non-clone pairs that look extremely similar but behave differently. Experiments on software clone detection benchmarks indicate that the proposed approach together with adversarial training outperforms the state-of-the-art approaches for software functional clone detection.

AAAI Conference 2018 Conference Paper

Semi-Supervised AUC Optimization Without Guessing Labels of Unlabeled Data

  • Zheng Xie
  • Ming Li

Semi-supervised learning, which aims to construct learners that automatically exploit the large amount of unlabeled data in addition to the limited labeled data, has been widely applied in many real-world applications. AUC is a well-known performance measure for a learner, and directly optimizing AUC may result in a better prediction performance. Thus, semi-supervised AUC optimization has drawn much attention. Existing semi-supervised AUC optimization methods exploit unlabeled data by explicitly or implicitly estimating the possible labels of the unlabeled data based on various distributional assumptions. However, these assumptions may be violated in many real-world applications, and estimating labels based on the violated assumption may lead to poor performance. In this paper, we argue that, in semi-supervised AUC optimization, it is unnecessary to guess the possible labels of the unlabeled data or prior probability based on any distributional assumptions. We analytically show that the AUC risk can be estimated unbiasedly by simply treating the unlabeled data as both positive and negative. Based on this finding, two semi-supervised AUC optimization methods named SAMULT and SAMPURA are proposed. Experimental results indicate that the proposed methods outperform the existing methods.

YNICL Journal 2018 Journal Article

Short- and long-range synergism disorders in lifelong premature ejaculation evaluated using the functional connectivity density and network property

  • Jiaming Lu
  • Xin Zhang
  • Huiting Wang
  • Zhao Qing
  • Peng Han
  • Ming Li
  • Jiadong Xia
  • Fei Chen

This study was aimed to investigate brain function connectivity in premature ejaculation (PE) patients using the functional connectivity density (FCD) and network property of resting-state functional magnetic resonance imaging. Twenty PE patients (mean age: 27. 95 ± 4. 52 years) and 15 normal controls (mean age: 27. 87 ± 3. 78 years) with no self-reported history of neurologic or psychiatric disease were enrolled in this study. International Index of Erectile Function-5 and Chinese Index of Sexual Function for Premature Ejaculation-5 questionnaires and self-reported intravaginal ejaculatory latency time (IELT) were obtained from each participant for symptom assessment. Two-sample t-tests (intergroup comparison) were applied in the short-range FCD (SFCD) analysis, long-range FCD (LFCD) analysis, region of interest–based analysis, and network topological organization analysis. Pearson correlation analysis was performed to correlate IELT with FCD or the network property. The patients with PE showed significantly decreased SFCD in the bilateral middle temporal gyrus, left orbitofrontal cortex, nucleus accumbens, fusiform, caudate, and thalamus (p < 0. 05, AlphaSim-corrected). Notably, all these aforementioned brain areas are located in the dopamine pathway. In contrast, increased LFCD was observed in the left insula, Heschl's gyrus, putamen, bilateral precuneus, supplementary motor area, middle cingulate cortex, and anterior cingulate cortex in PE patients (p < 0. 05, AlphaSim-corrected). In addition, the network topological analysis found reinforced network connectivity between several nodes. The degree of hub nodes increased in the patients with PE. IELT was positively correlated with SFCD and negatively correlated with LFCD or the degree of hub nodes (p < 0. 05, Pearson correlation). In summary, our results are important for understanding the brain network in PE patients. The present findings indicate that PE patients have a significant synergism disorder across the region of dopamine pathway, which implied neuronal pathological changes might be related with the change of dopamine. The FCD and network property can serve as new disease severity biomarkers and therapeutic targets in PE.

IJCAI Conference 2017 Conference Paper

Enhancing the Unified Features to Locate Buggy Files by Exploiting the Sequential Nature of Source Code

  • Xuan Huo
  • Ming Li

Bug reports provide an effective way for end-users to disclose potential bugs hidden in a software system, while automatically locating the potential buggy source files according to a bug report remains a great challenge in software maintenance. Many previous approaches represent bug reports and source code from lexical and structural information correlated their relevance by measuring their similarity, and recently a CNN-based model is proposed to learn the unified features for bug localization, which overcomes the difficulty in modeling natural and programming languages with different structural semantics. However, previous studies fail to capture the sequential nature of source code, which carries additional semantics beyond the lexical and structural terms and such information is vital in modeling program functionalities and behaviors. In this paper, we propose a novel model LS-CNN, which enhances the unified features by exploiting the sequential nature of source code. LS-CNN combines CNN and LSTM to extract semantic features for automatically identifying potential buggy source code according to a bug report. Experimental results on widely-used software projects indicate that LS-CNN significantly outperforms the state-of-the-art methods in locating buggy files.

IJCAI Conference 2017 Conference Paper

Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code

  • Huihui Wei
  • Ming Li

Software clone detection, aiming at identifying out code fragments with similar functionalities, has played an important role in software maintenance and evolution. Many clone detection approaches have been proposed. However, most of them represent source codes with hand-crafted features using lexical or syntactical information, or unsupervised deep features, which makes it difficult to detect the functional clone pairs, i. e. , pieces of codes with similar functionality but differing in both syntactical and lexical level. In this paper, we address the software functional clone detection problem by learning supervised deep features. We formulate the clone detection as a supervised learning to hash problem and propose an end-to-end deep feature learning framework called CDLH for functional clone detection. Such framework learns hash codes by exploiting the lexical and syntactical information for fast computation of functional similarity between code fragments. Experiments on software clone detection benchmarks indicate that the CDLH approach is effective and outperforms the state-of-the-art approaches in software functional clone detection.

IJCAI Conference 2016 Conference Paper

Learning Unified Features from Natural and Programming Languages for Locating Buggy Source Code

  • Xuan Huo
  • Ming Li
  • Zhi-Hua Zhou

Bug reports provide an effective way for end-users to disclose potential bugs hidden in a software system, while automatically locating the potential buggy source code according to a bug report remains a great challenge in software maintenance. Many previous studies treated the source code as natural language by representing both the bug report and source code based on bag-of-words feature representations, and correlate the bug report and source code by measuring similarity in the same lexical feature space. However, these approaches fail to consider the structure information of source code which carries additional semantics beyond the lexical terms. Such information is important in modeling program functionality. In this paper, we propose a novel convolutional neural network NP-CNN, which leverages both lexical and program structure information to learn unified features from natural language and source code in programming language for automatically locating the potential buggy source code according to bug report. Experimental results on widely-used software projects indicate that NP-CNN significantly outperforms the state-of-the-art methods in locating the buggy source files.

AAAI Conference 2016 Conference Paper

On Order-Constrained Transitive Distance Clustering

  • Zhiding Yu
  • Weiyang Liu
  • Wenbo Liu
  • Yingzhen Yang
  • Ming Li
  • B. V. K. Vijaya Kumar

We consider the problem of approximating order-constrained transitive distance (OCTD) and its clustering applications. Given any pairwise data, transitive distance (TD) is defined as the smallest possible “gap” on the set of paths connecting them. While such metric definition renders significant capability of addressing elongated clusters, it is sometimes also an over-simplified representation which loses necessary regularization on cluster structure and overfits to short links easily. As a result, conventional TD often suffers from degraded performance given clusters with “thick” structures. Our key intuition is that the maximum (path) order, which is the maximum number of nodes on a path, controls the level of flexibility. Reducing this order benefits the clustering performance by finding a trade-off between flexibility and regularization on cluster structure. Unlike TD, finding OCTD becomes an intractable problem even though the number of connecting paths is reduced. We therefore propose a fast approximation framework, using random samplings to generate multiple diversified TD matrices and a pooling to output the final approximated OCTD matrix. Comprehensive experiments on toy, image and speech datasets show the excellent performance of OCTD, surpassing TD with significant gains and giving state-of-the-art performance on several datasets.

YNIMG Journal 2014 Journal Article

Hemodynamic and electrophysiological spontaneous low-frequency oscillations in the cortex: Directional influences revealed by Granger causality

  • Liangming Huang
  • Yadong Liu
  • Ming Li
  • Dewen Hu

We used a combined electrophysiological/hemodynamic system to examine low-frequency oscillations (LFOs) in spontaneous neuronal activities (spike trains and local field potentials) and hemodynamic signals (cerebral blood flow) recorded from the anesthetized rat somatosensory and visual cortices. The laser Doppler flowmetry (LDF) probe was tilted slightly to approach the area in which a microelectrode array (MEA) was implanted for simultaneous recordings. Spike trains (STs) were converted into continuous-time rate functions (CRFs) using the ST instantaneous firing rates. LFOs were detected for all three of the components using the multi-taper method (MTM). The frequencies of these LFOs ranged from 0. 052 to 0. 167Hz (mean±SD, 0. 10±0. 026Hz) for cerebral blood flow (CBF), from 0. 027 to 0. 26Hz (mean±SD, 0. 12±0. 041Hz) for the CRFs of the STs and from 0. 04 to 0. 19Hz (mean±SD, 0. 11±0. 035Hz) for local field potentials (LFPs). We evaluated the Granger causal relationships of spontaneous LFOs among CBF, LFPs and CRFs using Granger causality (GC) analysis. Significant Granger causal relationships were observed from LFPs to CBF, from STs to CBF and from LFPs to STs at approximately 0. 1Hz. The present results indicate that spontaneous LFOs exist not only in hemodynamic components but also in neuronal activities of the rat cortex. To the best of our knowledge, the present study is the first to identify Granger causal influences among CBF, LFPs and STs and show that spontaneous LFOs carry important Granger causal influences from neural activities to hemodynamic signals.

IROS Conference 2010 Conference Paper

Incremental local online Gaussian Mixture Regression for imitation learning of multiple tasks

  • Thomas Cederborg
  • Ming Li
  • Adrien Baranes
  • Pierre-Yves Oudeyer

Gaussian Mixture Regression has been shown to be a powerful and easy-to-tune regression technique for imitation learning of constrained motor tasks in robots. Yet, current formulations are not suited when one wants a robot to learn incrementally and online a variety of new context-dependant tasks whose number and complexity is not known at programming time, and when the demonstrator is not allowed to tell the system when he introduces a new task (but rather the system should infer this from the continuous sensorimotor context). In this paper, we show that this limitation can be addressed by introducing an Incremental, Local and Online variation of Gaussian Mixture Regression (ILO-GMR) which successfully allows a simulated robot to learn incrementally and online new motor tasks through modelling them locally as dynamical systems, and able to use the sensorimotor context to cope with the absence of categorical information both during demonstrations and when a reproduction is asked to the system. Moreover, we integrate a complementary statistical technique which allows the system to incrementally learn various tasks which can be intrinsically defined in different frames of reference, which we call framings, without the need to tell the system which particular framing should be used for each task: this is inferred automatically by the system.

IJCAI Conference 2009 Conference Paper

  • Ming Li
  • Xiao-Bing Xue
  • Zhi-Hua Zhou

Given an imagebase with tagged images, four types of tasks can be executed, i. e. , content-based image retrieval, image annotation, text-based image retrieval, and query expansion. For any of these tasks the similarity on the concerned type of objects is essential. In this paper, we propose a framework to tackle these four tasks from a unified view. The essence of the framework is to estimate similarities by exploiting the interactions between objects of different modality. Experiments show that the proposed method can improve similarity estimation, and based on the improved similarity estimation, some simple methods can achieve better performances than some state-of-the-art techniques.

TCS Journal 2009 Journal Article

Finding compact structural motifs

  • Dongbo Bu
  • Ming Li
  • Shuai Cheng Li
  • Jianbo Qian
  • Jinbo Xu

Protein structural motif detection has important applications in structural genomics. Compared with sequence motifs, structural motifs are more sensitive in revealing the evolutionary relationships among proteins. A variety of algorithms have been proposed to attack this problem. However, they are either heuristic without theoretical performance guarantee, or inefficient due to employing exhaustive search strategies. This paper studies a reasonably restricted version of this problem: the compact structural motif problem. We prove that this restricted version is still NP-hard, and we present a polynomial-time approximation scheme to solve it. This is the first approximation algorithm with a guaranteed ratio for the protein structural motif problem. 1 1 A preliminary version of this paper appeared in CPM’2007.

TCS Journal 2009 Journal Article

On two open problems of 2-interval patterns

  • Shuai Cheng Li
  • Ming Li

The 2-interval pattern problem, introduced in [Stéphane Vialette, On the computational complexity of 2-interval pattern matching problems Theoret. Comput. Sci. 312 (2–3) (2004) 223–249], models general problems with biological structures such as protein contact maps and macroscopic describers of secondary structures of ribonucleic acids. Given a set of 2-intervals D and a model R, the problem is to find a maximum cardinality subset D ′ of D such that any two 2-intervals in D ′ satisfy R, where R is a subset of relations on disjoint 2-intervals: precedence ( < ), nest ( ⊏ ), and cross ( ≬ ). The problem left unanswered at present is that of whether there is a polynomial time solution for the 2-interval pattern problem, when R = { <, ≬ } and all the support intervals of D are disjoint. In this paper, we present a reduction from the clique problem to show that, in this case, the problem is NP-hard. The disjoint 2-interval pattern matching problem is to decide whether a disjoint 2-interval pattern (called the pattern) is a substructure of another disjoint 2-interval pattern (called the target). In general, the problem is NP-hard, but when there are restrictions on the form of the pattern, the problem can, in some cases, be solved in polynomial time. In particular, a polynomial time algorithm has been proposed (Gramm, WABI 2004 and IEEE/ACM TCBB 2004) for the case where the patterns are so-called crossing contact maps. In this paper we show that the problem is actually NP-hard and point out an error in the analysis of the above algorithm. 1 1 The second part of this paper appeared in WABI’2006.

AIJ Journal 2000 Journal Article

Applying MDL to learn best model granularity

  • Qiong Gao
  • Ming Li
  • Paul Vitányi

The Minimum Description Length (MDL) principle is solidly based on a provably ideal method of inference using Kolmogorov complexity. We test how the theory behaves in practice on a general problem in model selection: that of learning the best model granularity. The performance of a model depends critically on the granularity, for example the choice of precision of the parameters. Too high precision generally involves modeling of accidental noise and too low precision may lead to confusion of models that should be distinguished. This precision is often determined ad hoc. In MDL the best model is the one that most compresses a two-part code of the data set: this embodies “Occam's Razor”. In two quite different experimental settings the theoretical value determined using MDL coincides with the best value found experimentally. In the first experiment the task is to recognize isolated handwritten characters in one subject's handwriting, irrespective of size and orientation. Based on a new modification of elastic matching, using multiple prototypes per character, the optimal prediction rate is predicted for the learned parameter (length of sampling interval) considered most likely by MDL, which is shown to coincide with the best value found experimentally. In the second experiment the task is to model a robot arm with two degrees of freedom using a three layer feed-forward neural network where we need to determine the number of nodes in the hidden layer giving best modeling performance. The optimal model (the one that extrapolizes best on unseen examples) is predicted for the number of nodes in the hidden layer considered most likely by MDL, which again is found to coincide with the best value found experimentally.

AIJ Journal 1992 Journal Article

Theory and algorithms for plan merging

  • David E. Foulser
  • Ming Li
  • Qiang Yang

Merging operators in a plan can yield significant savings in the cost to execute a plan. This paper provides a formal theory for plan merging and presents both optimal and efficient heuristic algorithms for finding minimum cost merged plans. The optimal plan merging algorithm applies a dynamic programming method to handle multiple linear plans and is extended to partially ordered plans in a novel way. Furthermore, with worst-case and average-case complexity analysis and empirical tests, we demonstrate that efficient and well-behaved approximation algorithms are applicable for optimizing plans with large sizes.