Arrow Research search

Author name cluster

Jia Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

73 papers
2 author rows

Possible papers

73

AAAI Conference 2026 Conference Paper

Agent Journey Beyond RGB: Hierarchical Semantic-Spatial Representation Enrichment for Vision-and-Language Navigation

  • Xuesong Zhang
  • Yunbo Xu
  • Jia Li
  • Ruonan Liu
  • Zhenzhen Hu

Navigating unseen environments based on natural language instructions remains difficult for egocentric agents in Vision-and-Language Navigation (VLN). Intuitively, humans inherently ground concrete semantic knowledge within spatial layouts during indoor navigation. Although previous studies have introduced diverse environmental representations to enhance reasoning, other co-occurrence modalities are often naively concatenated with RGB features, resulting in suboptimal utilization of each modality's distinct contribution. Inspired by this, we propose a hierarchical Semantic Understanding and Spatial Awareness (SUSA) architecture to enable agents to perceive and ground environments at diverse scales. Specifically, the Textual Semantic Understanding (TSU) module supports local action prediction by generating view-level descriptions, thereby capturing fine-grained environmental semantics and narrowing the modality gap between instructions and environments. Complementarily, the Depth-enhanced Spatial Perception (DSP) module incrementally constructs a trajectory-level depth exploration map, providing the agent with a coarse-grained comprehension of the global spatial layout. Extensive experiments demonstrate that SUSA's hierarchical representation enrichment not only boosts the navigation performance of the baseline on discrete VLN benchmarks (REVERIE, R2R, and SOON), but also exhibits superior generalization to the continuous R2R-CE.

AAAI Conference 2026 Conference Paper

BAG: Benchmarking Anomaly Detection on Dynamic Graphs

  • Fengrui Hua
  • Yiyan Qi
  • Zikai Wei
  • Yuxing Tian
  • Chengjin Xu
  • Xiaojun Wu
  • Jia Li
  • Jian Guo

Anomaly detection in dynamic graphs is a critical area of research that focuses on identifying abnormal components within evolving graph structures that deviate significantly from typical patterns. Despite advancements in traditional temporal pattern mining and deep learning techniques, a comprehensive benchmarking framework for Dynamic Graph Anomaly Detection (DyGAD) has been lacking. To address this gap, we introduce BAG, the first comprehensive benchmark specifically designed for anomaly detection on dynamic graphs. BAG enables extensive evaluation of 25 leading DyGAD models, covering both classical approaches and advanced Dynamic Graph Neural Networks (DGNNs), across 10 diverse real-world datasets that include both synthetic and naturally occurring anomalies. The framework supports evaluations at both the edge and node levels, offering a robust tool to advance DyGAD research. Our main finding is that Continuous-time Dynamic Graph (CTDG) models demonstrate superior performance and potential in detecting anomalies in dynamic graph edges, compared to Discrete-time Dynamic Graph (DTDG) models. Furthermore, the results reveal that existing methods are less effective at detecting organic anomalies, primarily due to the presence of temporal anomalies and highly imbalanced samples. The proposed BAG benchmark significantly enhances the evaluation of DyGAD methods by improving dataset selection, metric application, and model training. Moreover, BAG supports reproducibility and further exploration in this field by integrating all models, datasets, and evaluation protocols into an open-source repository.

JBHI Journal 2026 Journal Article

Beyond NLL: Pathwise Cross-Entropy Loss for Discriminative and Calibrated Event-Time Survival Prediction

  • Jingmin Long
  • Jia Li
  • Jesper Kers
  • Fons J. Verbeek

Deep survival models are increasingly used for time-to-event prediction under censoring, yet training objectives remain a bottleneck. The widely used discrete-time negative log-likelihood (NLL) supervises hazards and can suffer from temporal information imbalance and gradient attenuation, yielding early-dominated probability mass and degraded late-horizon calibration, especially under heavy censoring and competing risks. We introduce Pathwise Cross-Entropy (PCE), which utilizes a symmetric, full-path objective that directly learns the occurred-by-t trajectory as a Cumulative Incidence Function (CIF). This direct approach seamlessly yields a normalized Probability Mass Function (PMF) for predicting event times, unlike NLL, where the derived PMF is structurally biased toward monotonic decrease, hindering its predictive utility. In a counting-process view, PCE supplies bidirectional gradients and constitutes a strictly proper scoring rule on counting paths. We extend PCE to competing risks with cause-specific supervision that avoids the multinomial coupling in NLL under competing risks. Empirically, across the tabular SEER and a WSI-derived kidney dataset and multiple backbones, PCE consistently improves discrimination (C-index, AUC) and calibration (IBS), produces calibration plots (ECE and PP plots) that are closer to observation, and enables ordinal first-hit time prediction directly with minimal practical monotonicity violations. These results indicate that PCE is a reliable and interpretable objective for single and competing-risk survival.

EAAI Journal 2026 Journal Article

Children’s psychological recognition with a multimodal language model incorporating visual language features

  • Yao-Dong Chen
  • Jia Li
  • Jing Xu

Although large language models have shown noticeable capability in open areas like translation and text classification, psychological theories are still required, therefore the threshold for emotional computing is high. This research focuses on children’s difficult problem of emotion perception by developing a child-specific model to improve the performance on visual question responses and facial expression categorization. We present a new visual-language model trained with text describing children’s traits and visual aspects, based on the bootstrapping language-image pretraining architecture. The child psychologist’s written instructions serve as a guide for the training process. To increase accuracy and achieve lower memory and calculation costs, the model is further fine-tuned and refined using the low-rank adaptation method. Three public children’s emotion classification and seven child psychology survey datasets are used to create the instruction and test the model performance. The outcomes demonstrate our models’ clear superiority over traditional deep learning models across all datasets. Following training, our multi-modal models extract meaningful visual information and demonstrate picture understanding, which is more sophisticated than categorization, demonstrating a knowledge of emotional perception.

AAAI Conference 2026 Conference Paper

DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt

  • Yitong Zhang
  • Jia Li
  • Liyi Cai
  • Ge Li

Large Vision-Language Models (LVLMs) have achieved impressive progress across various applications but remain vulnerable to malicious queries. Existing safety alignment approaches typically fail to resist malicious queries while preserving utility on benign ones effectively. To address these challenges, we propose DAVSP, which is built upon two key innovations. First, we introduce Visual Safety Prompt, which appends a trainable padding region around the input image. It preserves visual features and expands the optimization space. Second, we propose Deep Alignment, a novel approach to train the visual safety prompt through supervision in the model's activation space. It enhances the inherent ability of LVLMs to perceive malicious queries, achieving deeper alignment than prior works. Extensive experiments demonstrate that DAVSP effectively resists malicious queries while preserving benign input utility. Furthermore, DAVSP exhibits great cross-model generation ability. Ablation studies further reveal that both the Visual Safety Prompt and Deep Alignment are essential to the overall effectiveness.

AAAI Conference 2026 Conference Paper

Do Audio-Visual Segmentation Models Truly Segment Sounding Objects?

  • Jia Li
  • Wenjie Zhao
  • Ziru Huang
  • Yunhui Guo
  • Yapeng Tian

Unlike traditional visual segmentation, audio-visual segmentation (AVS) requires the model not only to identify and segment objects but also to determine whether they are sound sources. Recent AVS approaches have achieved impressive performance on standard benchmarks. Yet, an important question remains: Do these models genuinely integrate audio-visual cues to segment sounding objects? Our study reveals a fundamental bias in current methods: they tend to generate segmentation masks based predominantly on visual salience, irrespective of the audio context, resulting in unreliable predictions when sounds are absent or irrelevant. To address this challenge, we introduce AVSBench-Robust, a comprehensive benchmark incorporating diverse negative audio scenarios, including silence, noise, and off-screen sounds. We also propose a simple yet effective approach combining balanced training with negative samples and classifier-guided similarity learning. Our extensive experiments show that while state-of-the-art AVS methods consistently fail under negative audio conditions, our approach achieves remarkable improvements in both standard metrics and robustness measures, maintaining near-perfect false positive rates while preserving high-quality segmentation performance.

AAAI Conference 2026 Conference Paper

MedSpaformer: A Transferable Transformer with Multi-Granularity Token Sparsification for Medical Time Series Classification

  • Jiexia Ye
  • Weiqi Zhang
  • Ziyue Li
  • Jia Li
  • Fugee Tsung

Accurate medical time series (MedTS) classification is essential for effective clinical diagnosis, yet remains challenging due to complex multi-channel temporal dependencies, information redundancy, and label scarcity. While transformer-based models have shown promise in time series analysis, most are designed for forecasting tasks and fail to fully exploit the unique characteristics of MedTS. In this paper, we introduce MedSpaformer, a transformer-based framework tailored for MedTS classification. It incorporates a sparse token-based dual-attention mechanism that enables global context modeling and token sparsification, allowing dynamic feature refinement by focusing on informative tokens while reducing redundancy. This mechanism is integrated into a multi-granularity cross-channel encoding scheme to capture intra- and inter-granularity temporal dependencies and inter-channel correlations, enabling progressive refinement of task-relevant patterns in medical signals. The sparsification design allows our model to flexibly accommodate inputs with variable lengths and channel dimensions. We also introduce an adaptive label encoder to extract label semantics and address cross-dataset label space misalignment. Together, these components enhance the model’s transferability across heterogeneous medical datasets, which helps alleviate the challenge of label scarcity. Our model outperforms 13 baselines across 7 medical datasets under supervised learning. It also excels in few-shot learning and demonstrates zero-shot capability in both in-domain and cross-domain diagnostics. These results highlight MedSpaformer's robustness and its potential as a unified solution for MedTS classification across diverse settings.

YNIMG Journal 2026 Journal Article

Neural representations of emotional response inhibition reveal trait and state biomarkers in pediatric bipolar disorder

  • Jia Li
  • Rong Wang
  • Jianze Wu
  • Qian Xiao
  • Yuan Zhong

Pediatric bipolar disorder (PBD) is characterized by disrupted cognitive control, particularly in response inhibition under emotional interference. However, the neural underpinnings of these deficits, particularly how these impairments vary across emotional valence and whether they reflect trait markers or state alterations, remain unclear. While traditional univariate fMRI analyses reveal broad activation differences, they lack sensitivity to fine-grained neural patterns. This study aims to examine the neural representations of emotional response inhibition in PBD under valence-dependent interference using representational similarity analysis(RSA). We included manic (n = 15) and euthymic (n = 18) PBD patients, along with matched healthy controls (n = 17). Participants completed an emotional Go/NoGo task with happy, sad, and neutral faces during fMRI. Six contrast conditions were modeled to assess trait- and state-related effects. Whole-brain searchlight RSA (8 mm radius) was used to identify regions showing group differences in neural representational patterns. Results showed that emotional response inhibition engaged distributed neural systems, with distinct patterns across valence conditions. Compared to controls, PBD patients exhibited trait-related representational differences during happy inhibition, sad inhibition, and sad-specific inhibition, involving regions such as the precentral gyrus, middle frontal gyrus, and inferior parietal lobule. Manic patients showed state-related reductions in neural representations during sad-specific inhibition within frontal areas compared to euthymic patients. These findings indicate that emotional response inhibition deficits in PBD arise from both trait- and state-dependent abnormalities in neural representations. The study highlights the value of multivariate fMRI in uncovering clinically relevant biomarkers and provides a novel framework for developing phase-specific interventions.

AAAI Conference 2026 Conference Paper

Toward Gaze Target Detection of Young Autistic Children

  • Shijian Deng
  • Erin E. Kosloski
  • Siva Sai Nagender Vasireddy
  • Jia Li
  • Randi Sierra Sherwood
  • Feroz Mohamed Hatha
  • Siddhi Patel
  • Pamela R. Rollins

The automatic detection of gaze targets in autistic children through artificial intelligence can be impactful, especially for those who lack access to a sufficient number of professionals to improve their quality of life. This paper introduces a new, real-world AI application for gaze target detection in autistic children, which predicts a child's point of gaze from an activity image. This task is foundational for building automated systems that can measure joint attention—a core challenge in Autism Spectrum Disorder (ASD). To facilitate the study of this challenging application, we collected the first-ever Autism Gaze Target (AGT) Dataset. We further propose a novel social-aware coarse-to-fine (SACF) gaze detection framework that explicitly leverages the social context of a scene to overcome the class imbalance common in autism datasets—a consequence of autistic children's tendency to show reduced gaze to faces. It utilizes a two-pathway architecture with expert models specialized in social and non-social gaze, guided by a context-awareness gate module. The results of our comprehensive experiments demonstrate that our framework achieves new state-of-the-art performance for gaze target detection in this population, significantly outperforming existing methods, especially on the critical minority class of face-directed gaze.

AAAI Conference 2025 Conference Paper

A Comprehensive Evaluation on Event Reasoning of Large Language Models

  • Zhengwei Tao
  • Zhi Jin
  • Yifan Zhang
  • Xiancai Chen
  • Haiyan Zhao
  • Jia Li
  • Bin Liang
  • Chongyang Tao

Event reasoning is a fundamental ability that underlies many applications. It requires event schema knowledge to perform global reasoning and needs to deal with the diversity of the inter-event relations and the reasoning paradigms. The extent to which LLMs excel in event reasoning across various relations and reasoning paradigms has not been thoroughly investigated. Additionally, it is still unclear whether LLMs utilize event knowledge in the same way humans do. To mitigate this disparity, we comprehensively evaluate the abilities of event reasoning of LLMs on different relations, paradigms, and levels of abstraction. We introduce a novel benchmark EV2 for EValuation of EVent reasoning. EV2 consists of two levels of evaluation on schema and instance and is comprehensive in relations and reasoning paradigms. We conduct extensive experiments on EV2. We find that 1) LLMs have abilities to accomplish event reasoning but their performances are far from satisfactory. 2) There are imbalances of event reasoning abilities on different relations and paradigms. 3) LLMs have event schema knowledge, however, they're not aligned with humans on how to utilize the knowledge. Based on these findings, we guide the LLMs in utilizing the event schema knowledge as memory leading to improvements in event reasoning.

NeurIPS Conference 2025 Conference Paper

Diffusion-Classifier Synergy: Reward-Aligned Learning via Mutual Boosting Loop for FSCIL

  • Ruitao Wu
  • Yifan Zhao
  • Guangyao Chen
  • Jia Li

Few-Shot Class-Incremental Learning (FSCIL) challenges models to sequentially learn new classes from minimal examples without forgetting prior knowledge, a task complicated by the stability-plasticity dilemma and data scarcity. Current FSCIL methods often struggle with generalization due to their reliance on limited datasets. While diffusion models offer a path for data augmentation, their direct application can lead to semantic misalignment or ineffective guidance. This paper introduces Diffusion-Classifier Synergy (DCS), a novel framework that establishes a mutual boosting loop between diffusion model and FSCIL classifier. DCS utilizes a reward-aligned learning strategy, where a dynamic, multi-faceted reward function derived from the classifier's state directs the diffusion model. This reward system operates at two levels: the feature level ensures semantic coherence and diversity using prototype-anchored maximum mean discrepancy and dimension-wise variance matching, while the logits level promotes exploratory image generation and enhances inter-class discriminability through confidence recalibration and cross-session confusion-aware mechanisms. This co-evolutionary process, where generated images refine the classifier and an improved classifier state yields better reward signals, demonstrably achieves state-of-the-art performance on FSCIL benchmarks, significantly enhancing both knowledge retention and new class learning.

AAAI Conference 2025 Conference Paper

Fair Text-to-Image Diffusion via Fair Mapping

  • Jia Li
  • Lijie Hu
  • Jingfeng Zhang
  • Tianhang Zheng
  • Hua Zhang
  • Di Wang

In this paper, we address the limitations of existing text-to-image diffusion models in generating demographically fair results when given human-related descriptions. These models often struggle to disentangle the target language context from sociocultural biases, resulting in biased image generation. To overcome this challenge, we propose Fair Mapping, a flexible, model-agnostic, and lightweight approach that modifies a pre-trained text-to-image diffusion model by controlling the prompt to achieve fair image generation. One key advantage of our approach is its high efficiency. It only requires updating an additional linear network with few parameters at a low computational cost. By developing a linear network that maps conditioning embeddings into a debiased space, we enable the generation of relatively balanced demographic results based on the specified text condition. With comprehensive experiments on face image generation, we show that our method significantly improves image generation fairness with almost the same image quality compared to conventional diffusion models when prompted with descriptions related to humans. By effectively addressing the issue of implicit language bias, our method produces more fair and diverse image outputs.

NeurIPS Conference 2025 Conference Paper

FAN: Fourier Analysis Networks

  • Yihong Dong
  • Ge Li
  • Yongding Tao
  • Xue Jiang
  • Kechi Zhang
  • Jia Li
  • Jinliang Deng
  • Jing Su

Despite the remarkable successes of general-purpose neural networks, such as MLPs and Transformers, we find that they exhibit notable shortcomings in modeling and reasoning about periodic phenomena, achieving only marginal performance within the training domain and failing to generalize effectively to out-of-domain (OOD) scenarios. Periodicity is ubiquitous throughout nature and science. Therefore, neural networks should be equipped with the essential ability to model and handle periodicity. In this work, we propose FAN, a novel neural network that effectively addresses periodicity modeling challenges while offering broad applicability similar to MLP with fewer parameters and FLOPs. Periodicity is naturally integrated into FAN's structure and computational processes by introducing the Fourier Principle. Unlike existing Fourier-based networks, which possess particular periodicity modeling abilities but face challenges in scaling to deeper networks and are typically designed for specific tasks, our approach overcomes this challenge to enable scaling to large-scale models and maintains the capability to be applied to more types of tasks. Through extensive experiments, we demonstrate the superiority of FAN in periodicity modeling tasks and the effectiveness and generalizability of FAN across a range of real-world tasks. Moreover, we reveal that compared to existing Fourier-based networks, FAN accommodates both periodicity modeling and general-purpose modeling well.

AAAI Conference 2025 Conference Paper

FreeGen: Bridging Visual-Linguistic Discrepancies Towards Diffusion-based Pixel-level Data Synthesis

  • Wenzhuang Wang
  • Mingcan Ma
  • Yong Chen
  • Changqun Xia
  • Zhenbao Liang
  • Jia Li

Text-to-image diffusion model has inspired research into text-to-data synthesis without human intervention, where spatial attentions correlated with semantic entities in text prompts are primarily interpreted as pseudo-masks. However, these vannila attentions often deliver visual-linguistic discrepancies, in which the associations between image features and entity-level tokens are unstable and divergent, yielding inferior masks for realistic applications, especially in more practical open-vocabulary settings. To tackle this issue, we propose a novel text-guided self-driven generative paradigm, termed FreeGen, which addresses the discrepancies by recalibrating intrinsic visual-linguistic correlations and serves as a real-data-free method to automatically synthesize open-vocabulary pixel-level data for arbitrary entities. Specifically, we first learn an Attention Self-Rectification mechanism to reproject the inherent attention matrices to achieve robust semantic alignment, thereby obtaining class-discriminative masks. A Temporal Fluctuation Factor is present to assess mask quality based on its variation over uniform sampling timesteps, enabling the selection of reliable masks. These masks are then employed as self-supervised signals to support the learning of an Entity-level Grounding Decoder in a self-training manner, thus producing open-vocabulary segmentation results. Extensive experiments show that the existing segmenters trained on FreeGen narrow the performance gap with real data counterparts and remarkably outperform the state-of-the-art methods.

ICLR Conference 2025 Conference Paper

GraphArena: Evaluating and Exploring Large Language Models on Graph Computation

  • Jianheng Tang
  • Qifan Zhang
  • Yuhan Li
  • Nuo Chen
  • Jia Li

The ``arms race'' of Large Language Models (LLMs) demands new benchmarks to examine their progresses. In this paper, we introduce GraphArena, a benchmarking tool designed to evaluate LLMs on real-world graph computational problems. It offers a suite of four polynomial-time tasks (e.g., Shortest Distance) and six NP-complete challenges (e.g., Traveling Salesman Problem). GraphArena features a rigorous evaluation framework that classifies LLM outputs as correct, suboptimal (feasible but not optimal), hallucinatory (properly formatted but infeasible), or missing. Evaluation of over 10 LLMs reveals that even top-performing LLMs struggle with larger, more complex graph problems and exhibit hallucination issues. We further explore four potential solutions to address this issue and improve LLMs on graph computation, including chain-of-thought prompting, instruction tuning, code writing, and scaling test-time compute, each demonstrating unique strengths and limitations. GraphArena complements the existing LLM benchmarks and is open-sourced at https://github.com/squareRoot3/GraphArena.

AAAI Conference 2025 Conference Paper

Holistic Correction with Object Prototype for Video Object Segmentation

  • Shengye Qiao
  • Changqun Xia
  • Yanjie Liang
  • Gongjin Lan
  • Jia Li

Recently, memory-based methods have achieved progress in semi-supervised video object segmentation. However, these methods still suffer from unstructured challenges, such as object transformations, occlusions and disappearance-reappearance. To this end, we propose a Holistic Correction Network (HCNet) to adaptively acquire concise object prototypes for holistic correction at semantic, spatial and temporal aspects. Specifically, an Adaptive Prototype Update module is firstly designed to construct multi-level core object representations by associating object variations in consecutive frames with segmentation quality assessment. Based on the updated object prototypes, Semantic, Spatial and Temporal Correction modules are respectively designed to enhance the object semantics in the entire frame, eliminate the incorrect semantic enhancement outside the object regions and calibrate the estimated object regions with temporal changes of objects. Through the holistic correction mechanism with effective object prototypes, our proposed HCNet can robustly and efficiently deal with diverse complex scenarios. Extensive and comprehensive experiments conducted on seven datasets demonstrate that our proposed HCNet can significantly improve the segmentation performance.

IJCAI Conference 2025 Conference Paper

MedualTime: A Dual-Adapter Language Model for Medical Time Series-Text Multimodal Learning

  • Jiexia Ye
  • Weiqi Zhang
  • Ziyue Li
  • Jia Li
  • Meng Zhao
  • Fugee Tsung

The recent rapid advancements in language models (LMs) have garnered attention in medical time series-text multimodal learning. However, existing contrastive learning-based and prompt-based LM approaches tend to be biased, often assigning a primary role to time series modality while treating text modality as secondary. We classify these approaches under a temporal-primary paradigm, which may overlook the unique and critical task-relevant information embedded in text modality like clinical reports, thus failing to fully leverage mutual benefits and complementarity of different modalities. To fill this gap, we propose a novel textual-temporal multimodal learning paradigm that enables either modality to serve as the primary while being enhanced by the other, thereby effectively capturing modality-specific information and fostering cross-modal interaction. In specific, we design MedualTime, a language model composed of dual adapters to implement temporal-primary and textual-primary modeling simultaneously. Within each adapter, lightweight adaptation tokens are injected into the top layers of LM to encourage high-level modality fusion. The shared LM pipeline by dual adapters not only achieves adapter alignment but also enables efficient fine-tuning, reducing computational resources. Empirically, MedualTime demonstrates superior performance on medical data, achieving notable improvements of 8% accuracy and 12% F1 in supervised settings. Furthermore, MedualTime's transferability is validated by few-shot transfer experiments from coarse-grained to fine-grained medical data.

NeurIPS Conference 2025 Conference Paper

Mesh Interpolation Graph Network for Dynamic and Spatially Irregular Global Weather Forecasting

  • Zinan Zheng
  • Yang Liu
  • Jia Li

Graph neural networks have shown promising results in weather forecasting, which is critical for human activity such as agriculture planning and extreme weather preparation. However, most studies focus on finite and local areas for training, overlooking the influence of broader areas and limiting their ability to generalize effectively. Thus, in this work, we study global weather forecasting that is irregularly distributed and dynamically varying in practice, requiring the model to generalize to unobserved locations. To address such challenges, we propose a general Mesh Interpolation Graph Network (MIGN) that models the irregular weather station forecasting, consisting of two key designs: (1) learning spatially irregular data with regular mesh interpolation network to align the data; (2) leveraging parametric spherical harmonics location embedding to further enhance spatial generalization ability. Extensive experiments on an up-to-date observation dataset show that MIGN significantly outperforms existing data-driven models. Besides, we show that MIGN has spatial generalization ability, and is capable of generalizing to previously unseen stations.

NeurIPS Conference 2025 Conference Paper

Non-stationary Equivariant Graph Neural Networks for Physical Dynamics Simulation

  • Chaohao Yuan
  • Maoji Wen
  • Ercan KURUOGLU
  • Yang Liu
  • Jia Li
  • Tingyang Xu
  • Deli Zhao
  • Hong Cheng

To enhance the generalization ability of graph neural networks (GNNs) in learning and simulation physical dynamics, a series of equivariant GNNs have been developed to incorporate the symmetric inductive bias. However, the existing methods do not take into account the non-stationarity nature of physical dynamics, where the joint distribution changes over time. Moreover, previous approaches for modeling non-stationary time series typically involve normalizing the data, which disrupts the symmetric assumption inherent in physical dynamics. To model the non-stationary physical dynamics while preserving the symmetric inductive bias, we introduce a Non-Stationary Equivariant Graph Neural Network (NS-EGNN) to capture the non-stationarity in physical dynamics while preserving the symmetric property of the model. Specifically, NS-EGNN employs Fourier Transform on segments of physical dynamics to extract time-varying frequency information from the trajectories. It then uses the first and second-order differences to mitigate non-stationarity, followed by pooling for future predictions. Through capturing varying frequency characteristics and alleviate the linear and quadric trend in the raw physical dynamics, NS-EGNN better models the temporal dependencies in the physical dynamics. NS-EGNN has been applied on various types of physical dynamics, including molecular, motion and protein dynamics. In various scenario, NS-EGNN consistently surpasses the performance of existing state-of-the-art algorithms, underscoring its effectiveness. The implementation of NS-EGNN is available at https: //github. com/MaojiWEN/NS-EGNN.

YNIMG Journal 2025 Journal Article

Pain in focus: How persistent pain disrupts the attentional bias towards pain-related information

  • Jia Li
  • Xiaohan Lyu
  • Xiaoyun Li
  • Xilin Yang
  • Lingling Weng
  • Yi Wang
  • Weiwei Peng

Pain modulates attentional biases, contributing to chronic pain development and maintenance through enhanced focus on pain-related stimuli. This study employed drift-diffusion modeling (DDM) and multivariate EEG to investigate how sustained pain affects attention allocation. Using a crossover design, 58 healthy volunteers underwent two sessions (capsaicin-induced pain vs. control cream) while performing word- and picture-based dot-probe tasks. Probes appeared in locations either congruent or incongruent with pain-related stimuli, or after neutral stimulus pairs. Behavioral and neural responses to congruency and incongruency effects were compared between pain states. DDM revealed increased incongruency effects during pain, characterized by slower drift rates and narrower decision thresholds, suggesting impaired evidence accumulation. EEG analyses revealed two distinct pain-state modulations: (1) amplified P3 amplitudes (300-600 ms) during incongruent trials, and (2) multivariate decoding of δ/θ oscillations (1-7 Hz, 116-364 ms post-stimulus) that uniquely differentiated incongruent from neutral conditions specifically under pain. These behavioral and neural signatures of attentional disruption manifested selectively during verbal tasks, with no parallel effects observed in pictorial processing. Our findings demonstrate how pain disrupts cognitive control: impaired expectancy processing (early δ/θ oscillations), compromised decision formation (altered DDM parameters), and deficient response inhibition (P3 modulation). These results highlight verbal information processing as a key vulnerability in pain-related attentional bias, suggesting targeted interventions for cognitive control components could mitigate chronic pain consequences.

NeurIPS Conference 2025 Conference Paper

Re-coding for Uncertainties: Edge-awareness Semantic Concordance for Resilient Event-RGB Segmentation

  • Nan Bao
  • Yifan Zhao
  • Lin Zhu
  • Jia Li

Semantic segmentation has achieved great success in ideal conditions. However, when facing extreme conditions (e. g. , insufficient light, fierce camera motion), most existing methods suffer from significant information loss of RGB, severely damaging segmentation results. Several researches exploit the high-speed and high-dynamic event modality as a complement, but event and RGB are naturally heterogeneous, which leads to feature-level mismatch and inferior optimization of existing multi-modality methods. Different from these researches, we delve into the edge secret of both modalities for resilient fusion and propose a novel Edge-awareness Semantic Concordance framework to unify the multi-modality heterogeneous features with latent edge cues. In this framework, we first propose Edge-awareness Latent Re-coding, which obtains uncertainty indicators while realigning event-RGB features into unified semantic space guided by re-coded distribution, and transfers event-RGB distributions into re-coded features by utilizing a pre-established edge dictionary as clues. We then propose Re-coded Consolidation and Uncertainty Optimization, which utilize re-coded edge features and uncertainty indicators to solve the heterogeneous event-RGB fusion issues under extreme conditions. We establish two synthetic and one real-world event-RGB semantic segmentation datasets for extreme scenario comparisons. Experimental results show that our method outperforms the state-of-the-art by a 2. 55% mIoU on our proposed DERS-XS, and possesses superior resilience under spatial occlusion. Our code and datasets are publicly available at https: //github. com/iCVTEAM/ESC.

NeurIPS Conference 2025 Conference Paper

Reasoning is Periodicity? Improving Large Language Models Through Effective Periodicity Modeling

  • Yihong Dong
  • Ge Li
  • Xue Jiang
  • Yongding Tao
  • Kechi Zhang
  • Lecheng Wang
  • Hao Zhu
  • Huanyu Liu

Periodicity, as one of the most important basic characteristics, lays the foundation for facilitating structured knowledge acquisition and systematic cognitive processes within human learning paradigms. However, the potential flaws of periodicity modeling in Transformer affect the learning efficiency and establishment of underlying principles from data for large language models (LLMs) built upon it. In this paper, we demonstrate that integrating effective periodicity modeling can improve the learning efficiency and performance of LLMs. We introduce FANformer, which adapts Fourier Analysis Network (FAN) into attention mechanism to achieve efficient periodicity modeling, by modifying the feature projection process of attention mechanism. Extensive experimental results on language modeling show that FANformer consistently outperforms Transformer when scaling up model size and training tokens, underscoring its superior learning efficiency. Our pretrained FANformer-1B exhibits marked improvements on downstream tasks compared to open-source LLMs with similar model parameters or training tokens. Moreover, we reveal that FANformer exhibits superior ability to learn and apply rules for reasoning compared to Transformer. The results position FANformer as an effective and promising architecture for advancing LLMs.

NeurIPS Conference 2025 Conference Paper

Rebalancing Contrastive Alignment with Bottlenecked Semantic Increments in Text-Video Retrieval

  • Jian Xiao
  • Zijie Song
  • Jialong Hu
  • Hao Cheng
  • Zhenzhen Hu
  • Jia Li
  • Richang Hong

Recent progress in text–video retrieval has been largely driven by contrastive learning. However, existing methods often overlook the effect of the modality gap, which causes anchor representations to undergo in-place optimization (i. e. , optimization tension) that limits their alignment capacity. Moreover, noisy hard negatives further distort the semantics of anchors. To address these issues, we propose GARE, a Gap-Aware Retrieval framework that introduces a learnable, pair-specific increment $\Delta_{ij}$ between text $t_i$ and video $v_j$, redistributing gradients to relieve optimization tension and absorb noise. We derive $\Delta_{ij}$ via a multivariate first-order Taylor expansion of the InfoNCE loss under a trust-region constraint, showing that it guides updates along locally consistent descent directions. A lightweight neural module conditioned on the semantic gap couples increments across batches for structure-aware correction. Furthermore, we regularize $\Delta$ through a variational information bottleneck with relaxed compression, enhancing stability and semantic consistency. Experiments on four benchmarks demonstrate that GARE consistently improves alignment accuracy and robustness, validating the effectiveness of gap-aware tension mitigation.

NeurIPS Conference 2025 Conference Paper

Recursive Transformer: Boosting Reasoning Ability with State Stack

  • Kechi Zhang
  • Ge Li
  • Huangzhao Zhang
  • Yihong Dong
  • Jia Li
  • Jingjing Xu
  • Zhi Jin

The Transformer architecture has emerged as a landmark advancement within the broad field of artificial intelligence, effectively catalyzing the advent of large language models (LLMs). However, despite its remarkable capabilities and the substantial progress it has facilitated, the Transformer architecture still has some limitations. One such intrinsic limitation is its inability to effectively recognize regular expressions or deterministic context-free grammars. Standard Transformers lack an explicit mechanism for recursion and structured state transitions, which can hinder systematic generalization on nested and hierarchical patterns. Drawing inspiration from pushdown automata, which efficiently resolve deterministic context-free grammars using stacks, we equip layers with a differentiable stack and propose StackTrans with recursion to address the aforementioned issue within LLMs. Unlike previous approaches that modify the attention computation, StackTrans explicitly incorporates hidden state stacks between Transformer layers. This design maintains compatibility with existing frameworks like flash-attention. Specifically, our design features stack operations -- such as pushing and popping hidden states -- that are differentiable and can be learned in an end-to-end manner. Our comprehensive evaluation spans benchmarks for both Chomsky hierarchy and large-scale natural languages. Across these diverse tasks, StackTrans consistently outperforms standard Transformer models and other baselines. We have successfully scaled StackTrans up from 360M to 7B parameters. In particular, our from-scratch pretrained model StackTrans-360M outperforms several larger open-source LLMs with 2–3x more parameters, showcasing its superior efficiency and reasoning capability.

NeurIPS Conference 2025 Conference Paper

SATURN: SAT-based Reinforcement Learning to Unleash LLMs Reasoning

  • Huanyu Liu
  • Ge Li
  • Jia Li
  • Hao Zhu
  • Kechi Zhang
  • Yihong Dong

How to design reinforcement learning (RL) tasks that effectively unleash the reasoning capability of large language models (LLMs) remains an open question. Existing RL tasks (e. g. , math, programming, and constructing reasoning tasks) suffer from three key limitations: (1) Scalability. They rely heavily on human annotation or expensive LLM synthesis to generate sufficient training data. (2) Verifiability. LLMs' outputs are hard to verify automatically and reliably. (3) Controllable Difficulty. Most tasks lack fine-grained difficulty control, making it hard to train LLMs to develop reasoning ability from easy to hard. To address these limitations, we propose Saturn, a SAT-based RL framework that uses Boolean Satisfiability (SAT) problems to train and evaluate LLMs reasoning. Saturn enables scalable task construction, rule-based verification, and precise difficulty control. Saturn designs a curriculum learning pipeline that continuously improves LLMs' reasoning capability by constructing SAT tasks of increasing difficulty and training LLMs from easy to hard. To ensure stable training, we design a principled mechanism to control difficulty transitions. We introduce Saturn-2. 6k, a dataset of 2, 660 SAT problems with varying difficulty. It supports the evaluation of how LLM reasoning changes with problem difficulty. We apply Saturn to DeepSeek-R1-Distill-Qwen and obtain Saturn-1. 5B and Saturn-7B. We achieve several notable results: (1) On SAT problems, Saturn-1. 5B and Saturn-7B achieve average pass@3 improvements of +14. 0 and +28. 1, respectively. (2) On math and programming tasks, Saturn-1. 5B and Saturn-7B improve average scores by +4. 9 and +1. 8 on benchmarks (e. g. , AIME, LiveCodeBench). (3) Compared to the state-of-the-art (SOTA) approach in constructing RL tasks, Saturn achieves further improvements of +8. 8\%. We release the source code, data, and models to support future research.

AAAI Conference 2025 Conference Paper

Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval

  • Jian Xiao
  • Zhenzhen Hu
  • Jia Li
  • Richang Hong

Text-video retrieval (TVR) has seen substantial advancements in recent years, fueled by the utilization of pre-trained models and large language models (LLMs). Despite these advancements, achieving accurate matching in TVR remains challenging due to inherent disparities between video and textual modalities and irregularities in data representation. In this paper, we propose Text-Video-ProxyNet (TV-ProxyNet), a novel framework designed to decompose the conventional 1-to-N relationship of TVR into N distinct 1-to-1 relationships. By replacing a single text query with a series of text proxies, TV-ProxyNet not only broadens the query scope but also achieves a more precise expansion. Each text proxy is crafted through a refined iterative process, controlled by mechanisms we term as the director and dash, which regulate the proxy's direction and distance relative to the original text query. This setup not only facilitates more precise semantic alignment but also effectively manages the disparities and noise inherent in multimodal data. Our experiments on three representative video-text retrieval benchmarks, MSRVTT, DiDeMo, and ActivityNet Captions, demonstrate the effectiveness of TV-ProxyNet. The results show an improvement of 2.0% to 3.3% in R@1 over the baseline. TV-ProxyNet achieved state-of-the-art performance on MSRVTT and ActivityNet Captions, and a 2.0% improvement on DiDeMo compared to existing methods, validating our approach's ability to enhance semantic mapping and reduce error propensity.

NeurIPS Conference 2024 Conference Paper

4-bit Shampoo for Memory-Efficient Network Training

  • Sike Wang
  • Pan Zhou
  • Jia Li
  • Hua Huang

Second-order optimizers, maintaining a matrix termed a preconditioner, are superior to first-order optimizers in both theory and practice. The states forming the preconditioner and its inverse root restrict the maximum size of models trained by second-order optimizers. To address this, compressing 32-bit optimizer states to lower bitwidths has shown promise in reducing memory usage. However, current approaches only pertain to first-order optimizers. In this paper, we propose the first 4-bit second-order optimizers, exemplified by 4-bit Shampoo, maintaining performance similar to that of 32-bit ones. We show that quantizing the eigenvector matrix of the preconditioner in 4-bit Shampoo is remarkably better than quantizing the preconditioner itself both theoretically and experimentally. By rectifying the orthogonality of the quantized eigenvector matrix, we enhance the approximation of the preconditioner's eigenvector matrix, which also benefits the computation of its inverse 4-th root. Besides, we find that linear square quantization slightly outperforms dynamic tree quantization when quantizing second-order optimizer states. Evaluation on various networks for image classification and natural language modeling demonstrates that our 4-bit Shampoo achieves comparable performance to its 32-bit counterpart while being more memory-efficient.

JBHI Journal 2024 Journal Article

A Multi-Task Transformer With Local-Global Feature Interaction and Multiple Tumoral Region Guidance for Breast Cancer Diagnosis

  • Yi Zhang
  • Bolun Zeng
  • Jia Li
  • Yuanyi Zheng
  • Xiaojun Chen

Breast cancer, as a malignant tumor disease, has maintained high incidence and mortality rates over the years. Ultrasonography is one of the primary methods for diagnosing early-stage breast cancer. However, correctly interpreting breast ultrasound images requires massive time from physicians with specialized knowledge and extensive experience. Recently, deep learning-based method have made significant advancements in breast tumor segmentation and classification due to their powerful fitting capabilities. However, most existing methods focus on performing one of these tasks separately, and often failing to effectively leverage information from specific tumor-related areas that hold considerable diagnostic value. In this study, we propose a multi-task network with local-global feature interaction and multiple tumoral region guidance for breast ultrasound-based tumor segmentation and classification. Specifically, we construct a dual-stream encoder, paralleling CNN and Transformer, to facilitate hierarchical interaction and fusion of local and global features. This architecture enables each stream to capitalize on the strengths of the other while preserving its unique characteristics. Moreover, we design a multi-tumoral region guidance module to explicitly learn long-range non-local dependencies within intra-tumoral and peri-tumoral regions from spatial domain, thus providing interpretable cues beneficial for classification. Experimental results on two breast ultrasound datasets show that our network outperforms state-of-the-art methods in tumor segmentation and classification tasks. Compared with the second-best competitive method, our network improves the diagnosis accuracy from 73. 64% to 80. 21% on a large external validation dataset, which demonstrates its superior generalization capability.

IJCAI Conference 2024 Conference Paper

A Survey of Graph Meets Large Language Model: Progress and Future Directions

  • Yuhan Li
  • Zhixun Li
  • Peisong Wang
  • Jia Li
  • Xiangguo Sun
  • Hong Cheng
  • Jeffrey Xu Yu

Graph plays a significant role in representing and analyzing complex relationships in real-world applications such as citation networks, social networks, and biological data. Recently, Large Language Models (LLMs), which have achieved tremendous success in various domains, have also been leveraged in graph-related tasks to surpass traditional Graph Neural Networks (GNNs) based methods and yield state-of-the-art performance. In this survey, we first present a comprehensive review and analysis of existing methods that integrate LLMs with graphs. First of all, we propose a new taxonomy, which organizes existing methods into three categories based on the role (i. e. , enhancer, predictor, and alignment component) played by LLMs in graph-related tasks. Then we systematically survey the representative methods along the three categories of the taxonomy. Finally, we discuss the remaining limitations of existing studies and highlight promising avenues for future research. The relevant papers are summarized and will be consistently updated at: https: //github. com/yhLeeee/Awesome-LLMs-in-Graph-tasks.

IJCAI Conference 2024 Conference Paper

All in One: Multi-task Prompting for Graph Neural Networks (Extended Abstract)

  • Xiangguo Sun
  • Hong Cheng
  • Jia Li
  • Bo Liu
  • Jihong Guan

This paper is an extended abstract of our original work published in KDD23, where we won the best research paper award. The paper introduces a novel approach to bridging the gap between pre-trained graph models and the diverse tasks they’re applied to, inspired by the success of prompt learning in NLP. Recognizing the challenge of aligning pre-trained models with varied graph tasks (node level, edge level, and graph level), which can lead to negative transfer and poor performance, we propose a multi-task prompting method for graphs. This method involves unifying graph and language prompt formats, enabling NLP’s prompting strategies to be adapted for graph tasks. By analyzing the task space of graph applications, we reformulate problems to fit graph-level tasks and apply meta-learning to improve prompt initialization for multiple tasks. Experiments show our method’s effectiveness in enhancing model performance across different graph tasks. Beyond the original work, in this extended abstract, we further discuss the graph prompt from a bigger picture and provide some of the latest work toward this area.

NeurIPS Conference 2024 Conference Paper

ElasTST: Towards Robust Varied-Horizon Forecasting with Elastic Time-Series Transformer

  • Jiawen Zhang
  • Shun Zheng
  • Xumeng Wen
  • Xiaofang Zhou
  • Jiang Bian
  • Jia Li

Numerous industrial sectors necessitate models capable of providing robust forecasts across various horizons. Despite the recent strides in crafting specific architectures for time-series forecasting and developing pre-trained universal models, a comprehensive examination of their capability in accommodating varied-horizon forecasting during inference is still lacking. This paper bridges this gap through the design and evaluation of the Elastic Time-Series Transformer (ElasTST). The ElasTST model incorporates a non-autoregressive design with placeholders and structured self-attention masks, warranting future outputs that are invariant to adjustments in inference horizons. A tunable version of rotary position embedding is also integrated into ElasTST to capture time-series-specific periods and enhance adaptability to different horizons. Additionally, ElasTST employs a multi-scale patch design, effectively integrating both fine-grained and coarse-grained information. During the training phase, ElasTST uses a horizon reweighting strategy that approximates the effect of random sampling across multiple horizons with a single fixed horizon setting. Through comprehensive experiments and comparisons with state-of-the-art time-series architectures and contemporary foundation models, we demonstrate the efficacy of ElasTST's unique design elements. Our findings position ElasTST as a robust solution for the practical necessity of varied-horizon forecasting.

TMLR Journal 2024 Journal Article

Equivariant Graph Learning for High-density Crowd Trajectories Modeling

  • Yang Liu
  • Zinan Zheng
  • Yu Rong
  • Jia Li

Understanding the high-density crowd dynamics of urbanization plays an important role in architectural design and urban planning, preventing the occurrence of crowd crush. Most traditional methods rely on formulas designed based on expert knowledge, which are inflexible and incomplete to model complex real-world crowd trajectories. To address the issue, recent studies propose to simulate crowds via data-driven models. However, these models fail to learn the inherent symmetry of high-density crowd trajectories, leading to insufficient generalization ability. For example, existing models can not predict left-to-right trajectories by learning right-to-left trajectories, even though they share similar patterns. In this work, we propose a novel Equivariant Graph Learning framework for high-density crowd dynamic modeling, called CrowdEGL. It utilizes an additional objective to encourage models to predict the transformed output given the input under the same transformation. We summarize three types of transformation groups, which are determined by the symmetry of environments. To explicitly incorporate these augmented data, a multi-channel GNN is employed to learn the latent graph embedding of pedestrian patterns. Finally, to model dense crowd interactions, future positions of original and transformed inputs are obtained by multiple independent graph decoders. Extensive experiments on 8 datasets from 5 different environments show that CrowdEGL outperforms existing models by a large margin.

ICLR Conference 2024 Conference Paper

EventRPG: Event Data Augmentation with Relevance Propagation Guidance

  • Mingyuan Sun
  • Donghao Zhang 0004
  • Zongyuan Ge
  • Jiaxu Wang
  • Jia Li
  • Zheng Fang 0001
  • Renjing Xu

Event camera, a novel bio-inspired vision sensor, has drawn a lot of attention for its low latency, low power consumption, and high dynamic range. Currently, overfitting remains a critical problem in event-based classification tasks for Spiking Neural Network (SNN) due to its relatively weak spatial representation capability. Data augmentation is a simple but efficient method to alleviate overfitting and improve the generalization ability of neural networks, and saliency-based augmentation methods are proven to be effective in the image processing field. However, there is no approach available for extracting saliency maps from SNNs. Therefore, for the first time, we present Spiking Layer-Time-wise Relevance Propagation rule (SLTRP) and Spiking Layer-wise Relevance Propagation rule (SLRP) in order for SNN to generate stable and accurate CAMs and saliency maps. Based on this, we propose EventRPG, which leverages relevance propagation on the spiking neural network for more efficient augmentation. Our proposed method has been evaluated on several SNN structures, achieving state-of-the-art performance in object recognition tasks including N-Caltech101, CIFAR10-DVS, with accuracies of 85.62% and 85.55%, as well as action recognition task SL-Animals with an accuracy of 91.59%. Our code is available at https://github.com/myuansun/EventRPG.

NeurIPS Conference 2024 Conference Paper

EvoCodeBench: An Evolving Code Generation Benchmark with Domain-Specific Evaluations

  • Jia Li
  • Ge Li
  • Xuanming Zhang
  • YunFei Zhao
  • Yihong Dong
  • Zhi Jin
  • Binhua Li
  • Fei Huang

How to evaluate Large Language Models (LLMs) in code generation remains an open question. Many benchmarks have been proposed, but they have two limitations, i. e. , data leakage and lack of domain-specific evaluation. The former hurts the fairness of benchmarks, and the latter hinders practitioners from selecting superior LLMs for specific programming domains. To address these two limitations, we propose a new benchmark - EvoCodeBench, which has the following advances: (1) Evolving data. EvoCodeBench will be dynamically updated every period (e. g. , 6 months) to avoid data leakage. This paper releases the first version - EvoCodeBench-2403, containing 275 samples from 25 repositories. (2) A domain taxonomy and domain labels. Based on the statistics of open-source communities, we design a programming domain taxonomy consisting of 10 popular domains. Based on the taxonomy, we annotate each sample in EvoCodeBench with a domain label. EvoCodeBench provides a broad platform for domain-specific evaluations. (3) Domain-specific evaluations. Besides the Pass@k, we compute the Domain-Specific Improvement (DSI) and define LLMs' comfort and strange domains. These evaluations help practitioners select superior LLMs in specific domains and discover the shortcomings of existing LLMs. Besides, EvoCodeBench is collected by a rigorous pipeline and aligns with real-world repositories in multiple aspects (e. g. , code distributions). We evaluate 8 popular LLMs (e. g. , gpt-4, DeepSeek Coder, StarCoder 2) on EvoCodeBench and summarize some insights. EvoCodeBench reveals the actual abilities of these LLMs in real-world repositories. For example, the highest Pass@1 of gpt-4 on EvoCodeBench-2403 is only 20. 74%. Besides, we evaluate LLMs in different domains and discover their comfort and strange domains. For example, gpt-4 performs best in most domains but falls behind others in the Internet domain. StarCoder 2-15B unexpectedly performs well in the Database domain and even outperforms 33B LLMs. We release EvoCodeBench, all prompts, and LLMs' completions for further community analysis.

NeurIPS Conference 2024 Conference Paper

GLBench: A Comprehensive Benchmark for Graph with Large Language Models

  • Yuhan Li
  • Peisong Wang
  • Xiao Zhu
  • Aochuan Chen
  • Haiyun Jiang
  • Deng Cai
  • Victor W. Chan
  • Jia Li

The emergence of large language models (LLMs) has revolutionized the way we interact with graphs, leading to a new paradigm called GraphLLM. Despite the rapid development of GraphLLM methods in recent years, the progress and understanding of this field remain unclear due to the lack of a benchmark with consistent experimental protocols. To bridge this gap, we introduce GLBench, the first comprehensive benchmark for evaluating GraphLLM methods in both supervised and zero-shot scenarios. GLBench provides a fair and thorough evaluation of different categories of GraphLLM methods, along with traditional baselines such as graph neural networks. Through extensive experiments on a collection of real-world datasets with consistent data processing and splitting strategies, we have uncovered several key findings. Firstly, GraphLLM methods outperform traditional baselines in supervised settings, with LLM-as-enhancers showing the most robust performance. However, using LLMs as predictors is less effective and often leads to uncontrollable output issues. We also notice that no clear scaling laws exist for current GraphLLM methods. In addition, both structures and semantics are crucial for effective zero-shot transfer, and our proposed simple baseline can even outperform several models tailored for zero-shot scenarios. The data and code of the benchmark can be found at https: //github. com/NineAbyss/GLBench.

AAAI Conference 2024 Conference Paper

Hot or Cold? Adaptive Temperature Sampling for Code Generation with Large Language Models

  • Yuqi Zhu
  • Ge Li
  • YunFei Zhao
  • Jia Li
  • Zhi Jin
  • Hong Mei

Recently, Large Language Models (LLMs) have shown impressive abilities in code generation. However, existing LLMs' decoding strategies are designed for Natural Language (NL) generation, overlooking the differences between NL and programming languages (PL). Due to this oversight, a better decoding strategy for code generation remains an open question. In this paper, we conduct the first systematic study to explore a decoding strategy specialized in code generation. With an analysis of loss distributions of code tokens, we find that code tokens can be divided into two categories: challenging tokens that are difficult to predict and confident tokens that can be easily inferred. Among them, the challenging tokens mainly appear at the beginning of a code block. Inspired by the above findings, we propose a simple yet effective method: Adaptive Temperature (AdapT) sampling, which dynamically adjusts the temperature coefficient when decoding different tokens. We apply a larger temperature when sampling for challenging tokens, allowing LLMs to explore diverse choices. We employ a smaller temperature for confident tokens avoiding the influence of tail randomness noises. We apply AdapT sampling to LLMs with different sizes and conduct evaluations on two popular datasets. Results show that AdapT sampling significantly outperforms state-of-the-art decoding strategy.

NeurIPS Conference 2024 Conference Paper

How to Use Diffusion Priors under Sparse Views?

  • Qisen Wang
  • Yifan Zhao
  • Jiawei Ma
  • Jia Li

Novel view synthesis under sparse views has been a long-term important challenge in 3D reconstruction. Existing works mainly rely on introducing external semantic or depth priors to supervise the optimization of 3D representations. However, the diffusion model, as an external prior that can directly provide visual supervision, has always underperformed in sparse-view 3D reconstruction using Score Distillation Sampling (SDS) due to the low information entropy of sparse views compared to text, leading to optimization challenges caused by mode deviation. To this end, we present a thorough analysis of SDS from the mode-seeking perspective and propose Inline Prior Guided Score Matching (IPSM), which leverages visual inline priors provided by pose relationships between viewpoints to rectify the rendered image distribution and decomposes the original optimization objective of SDS, thereby offering effective diffusion visual guidance without any fine-tuning or pre-training. Furthermore, we propose the IPSM-Gaussian pipeline, which adopts 3D Gaussian Splatting as the backbone and supplements depth and geometry consistency regularization based on IPSM to further improve inline priors and rectified distribution. Experimental results on different public datasets show that our method achieves state-of-the-art reconstruction quality. The code is released at https: //github. com/iCVTEAM/IPSM.

ICRA Conference 2024 Conference Paper

Implicit Coarse-to-Fine 3D Perception for Category-level Object Pose Estimation from Monocular RGB Image

  • Jia Li
  • Li Jin
  • Xibin Song
  • Yeheng Chen
  • Nan Li
  • Xueying Qin

Category-level object pose estimation demonstrates robust generalization capabilities that benefit robotics applications. However, exclusive reliance on RGB images without leveraging any 3D information introduces ambiguity in the translation and size of objects, leading to suboptimal performance. In this paper, we propose a framework for category-level pose estimation from a single RGB image in an end-to-end manner, i. e. , Feature Auxiliary Perception Network (FAP-Net). To address inaccurate pose estimation caused by the inherent ambiguity of RGB images, we design a coarse-to-fine approach that first harnesses geometry supervision to facilitate coarse 3D feature perception and subsequently refines the features based on pose and size constraints. Experimental results on REAL275 and CAMERA25 demonstrate that FAP-Net achieves significant improvements (14. 7% on 10°10cm and 11. 4% on IoU50 on the real-scene REAL275 dataset) over the state-of-the-art and real-time inference (42 FPS).

AAAI Conference 2024 Conference Paper

Learning Performance Maximizing Ensembles with Explainability Guarantees

  • Vincent Pisztora
  • Jia Li

In this paper we propose a method for the optimal allocation of observations between an intrinsically explainable glass box model and a black box model. An optimal allocation being defined as one which, for any given explainability level (i.e. the proportion of observations for which the explainable model is the prediction function), maximizes the performance of the ensemble on the underlying task, and maximizes performance of the explainable model on the observations allocated to it, subject to the maximal ensemble performance condition. The proposed method is shown to produce such explainability optimal allocations on a benchmark suite of tabular datasets across a variety of explainable and black box model types. These learned allocations are found to consistently maintain ensemble performance at very high explainability levels (explaining 74% of observations on average), and in some cases even outperform both the component explainable and black box models while improving explainability.

NeurIPS Conference 2024 Conference Paper

ProbTS: Benchmarking Point and Distributional Forecasting across Diverse Prediction Horizons

  • Jiawen Zhang
  • Xumeng Wen
  • Zhenwei Zhang
  • Shun Zheng
  • Jia Li
  • Jiang Bian

Delivering precise point and distributional forecasts across a spectrum of prediction horizons represents a significant and enduring challenge in the application of time-series forecasting within various industries. Prior research on developing deep learning models for time-series forecasting has often concentrated on isolated aspects, such as long-term point forecasting or short-term probabilistic estimations. This narrow focus may result in skewed methodological choices and hinder the adaptability of these models to uncharted scenarios. While there is a rising trend in developing universal forecasting models, a thorough understanding of their advantages and drawbacks, especially regarding essential forecasting needs like point and distributional forecasts across short and long horizons, is still lacking. In this paper, we present ProbTS, a benchmark tool designed as a unified platform to evaluate these fundamental forecasting needs and to conduct a rigorous comparative analysis of numerous cutting-edge studies from recent years. We dissect the distinctive data characteristics arising from disparate forecasting requirements and elucidate how these characteristics can skew methodological preferences in typical research trajectories, which often fail to fully accommodate essential forecasting needs. Building on this, we examine the latest models for universal time-series forecasting and discover that our analyses of methodological strengths and weaknesses are also applicable to these universal models. Finally, we outline the limitations inherent in current research and underscore several avenues for future exploration.

NeurIPS Conference 2024 Conference Paper

ProG: A Graph Prompt Learning Benchmark

  • Chenyi Zi
  • Haihong Zhao
  • Xiangguo Sun
  • Yiqing Lin
  • Hong Cheng
  • Jia Li

Artificial general intelligence on graphs has shown significant advancements across various applications, yet the traditional `Pre-train & Fine-tune' paradigm faces inefficiencies and negative transfer issues, particularly in complex and few-shot settings. Graph prompt learning emerges as a promising alternative, leveraging lightweight prompts to manipulate data and fill the task gap by reformulating downstream tasks to the pretext. However, several critical challenges still remain: how to unify diverse graph prompt models, how to evaluate the quality of graph prompts, and to improve their usability for practical comparisons and selection. In response to these challenges, we introduce the first comprehensive benchmark for graph prompt learning. Our benchmark integrates SIX pre-training methods and FIVE state-of-the-art graph prompt techniques, evaluated across FIFTEEN diverse datasets to assess performance, flexibility, and efficiency. We also present 'ProG', an easy-to-use open-source library that streamlines the execution of various graph prompt models, facilitating objective evaluations. Additionally, we propose a unified framework that categorizes existing graph prompt methods into two main approaches: prompts as graphs and prompts as tokens. This framework enhances the applicability and comparison of graph prompt techniques. The code is available at: https: //github. com/sheldonresearch/ProG.

NeurIPS Conference 2024 Conference Paper

Towards Stable Representations for Protein Interface Prediction

  • Ziqi Gao
  • Zijing Liu
  • Yu Li
  • Jia Li

The knowledge of protein interactions is crucial but challenging for drug discovery applications. This work focuses on protein interface prediction, which aims to determine whether a pair of residues from different proteins interact. Existing data-driven methods have made significant progress in effectively learning protein structures. Nevertheless, they overlook the conformational changes (i. e. , flexibility) within proteins upon binding, leading to poor generalization ability. In this paper, we regard the protein flexibility as an attack on the trained model and aim to defend against it for improved generalization. To fulfill this purpose, we propose ATProt, an adversarial training framework for protein representations to robustly defend against the attack of protein flexibility. ATProt can theoretically guarantee protein representation stability under complicated protein flexibility. Experiments on various benchmarks demonstrate that ATProt consistently improves the performance for protein interface prediction. Moreover, our method demonstrates broad applicability, performing the best even when provided with testing structures from structure prediction models like ESMFold and AlphaFold2.

NeurIPS Conference 2024 Conference Paper

UniGAD: Unifying Multi-level Graph Anomaly Detection

  • Yiqing Lin
  • Jianheng Tang
  • Chenyi Zi
  • H. Vicky Zhao
  • Yuan Yao
  • Jia Li

Graph Anomaly Detection (GAD) aims to identify uncommon, deviated, or suspicious objects within graph-structured data. Existing methods generally focus on a single graph object type (node, edge, graph, etc. ) and often overlook the inherent connections among different object types of graph anomalies. For instance, a money laundering transaction might involve an abnormal account and the broader community it interacts with. To address this, we present UniGAD, the first unified framework for detecting anomalies at node, edge, and graph levels jointly. Specifically, we develop the Maximum Rayleigh Quotient Subgraph Sampler (MRQSampler) that unifies multi-level formats by transferring objects at each level into graph-level tasks on subgraphs. We theoretically prove that MRQSampler maximizes the accumulated spectral energy of subgraphs (i. e. , the Rayleigh quotient) to preserve the most significant anomaly information. To further unify multi-level training, we introduce a novel GraphStitch Network to integrate information across different levels, adjust the amount of sharing required at each level, and harmonize conflicting training goals. Comprehensive experiments show that UniGAD outperforms both existing GAD methods specialized for a single task and graph prompt-based approaches for multiple tasks, while also providing robust zero-shot task transferability.

TAAS Journal 2024 Journal Article

Using Genetic Programming to Build Self-Adaptivity into Software-Defined Networks

  • Jia Li
  • Shiva Nejati
  • Mehrdad Sabetzadeh

Self-adaptation solutions need to periodically monitor, reason about, and adapt a running system. The adaptation step involves generating an adaptation strategy and applying it to the running system whenever an anomaly arises. In this article, we argue that rather than generating individual adaptation strategies, the goal should be to adapt the control logic of the running system in such a way that the system itself would learn how to steer clear of future anomalies, without triggering self-adaptation too frequently. Although the need for adaptation is never eliminated, especially noting the uncertain and evolving environment of complex systems, reducing the frequency of adaptation interventions is advantageous for various reasons, such as to increase performance and to make a running system more robust. We instantiate and empirically examine the preceding idea for software-defined networking—a key enabling technology for modern data centers and Internet of Things applications. Using genetic programming (GP), we propose a self-adaptation solution that continuously learns and updates the control constructs in the data-forwarding logic of a software-defined network. Our evaluation, performed using open source synthetic and industrial data, indicates that compared to a baseline adaptation technique that attempts to generate individual adaptations, our GP-based approach is more effective in resolving network congestion, and further, it reduces the frequency of adaptation interventions over time. In addition, we show that for networks with the same topology, reusing over larger networks the knowledge that is learned on smaller networks leads to significant improvements in the performance of our GP-based adaptation approach. Finally, we compare our approach against a standard data-forwarding algorithm from the network literature, demonstrating that our approach significantly reduces packet loss.

EAAI Journal 2024 Journal Article

WGformer: A Weibull-Gaussian Informer based model for wind speed prediction

  • Ziyi Shi
  • Jia Li
  • Zheyuan Jiang
  • Huang Li
  • Chengqing Yu
  • Xiwei Mi

Accurate wind speed forecasting can improve energy management efficiency and promote the use of renewable energy. However, the inherent nonlinearity and fluctuation of wind speed make prediction challenging. To address these issues, we design an efficient Informer-based model, with improved calculation speed, forecasting accuracy and generalization ability. The proposed model in this paper reasonably integrates the Weibull-Gaussian transform, Informer and kernel mean square error loss and addresses the combination of various components. The Weibull-Gaussian transform is used as the data preprocessing module, which can remove non-Gaussian characteristics from the original data, and thus achieve noise reduction. The Informer is used as the main predictor, which can efficiently output accurate forecasting results based on an encoder-decoder architecture and self-attention mechanism. The kernel mean square error loss function, which shows strong robustness to outliers, is used to evaluate the nonlinearity of errors in reproducing kernel Hilbert space. To evaluate the performance of the proposed model, it is compared with several widely used models and state-of-the-art models. The experimental results indicate that the proposed model weakens the effect of outliers, yields high forecasting accuracy with mean square error = 0. 35, and outperforms the baselines up to 8. 5% on three datasets.

TIST Journal 2023 Journal Article

A Semantically Driven Hybrid Network for Unsupervised Entity Alignment

  • Jia Li
  • Dandan Song
  • Zhijing Wu

The major challenge in the task of entity alignment (EA) lies in the heterogeneity of the knowledge graph. The traditional solution to EA is to first map entities to the same space via knowledge embedding and then calculate the similarity between entities from different knowledge graphs. However, these methods mainly rely on manually labeled seeds of EA, which limits their applicability. Some researchers have begun using pseudo-labels rather than seeds for unsupervised EA. However, directly using pseudo-labels causes new problems, such as noise in the pseudo-labels. In this article, we propose a model called the Semantically Driven Hybrid Network (SDHN) to reduce the impact of noise in the pseudo-labels on the performance of EA models. The SDHN consists of two modules: a Teacher–Student Network (TSN) and a Rotation and Penalty (RAP) module. The TSN module reduces the impact of noise in two ways: (1) The TSN’s teacher network guides its student network to construct pseudo-labels based on semantic information instead of directly creating pseudo-labels. (2) It adaptively fuses semantic information into student networks to improve the final representation of entity embedding. Finally, the TSN enhances the performance of models of entity alignment via the RAP module. The results of experiments on multiple benchmark datasets showed that the SDHN outperforms state-of-the-art models.

NeurIPS Conference 2023 Conference Paper

Deep Insights into Noisy Pseudo Labeling on Graph Data

  • Botao Wang
  • Jia Li
  • Yang Liu
  • Jiashun Cheng
  • Yu Rong
  • Wenjia Wang
  • Fugee Tsung

Pseudo labeling (PL) is a wide-applied strategy to enlarge the labeled dataset by self-annotating the potential samples during the training process. Several works have shown that it can improve the graph learning model performance in general. However, we notice that the incorrect labels can be fatal to the graph training process. Inappropriate PL may result in the performance degrading, especially on graph data where the noise can propagate. Surprisingly, the corresponding error is seldom theoretically analyzed in the literature. In this paper, we aim to give deep insights of PL on graph learning models. We first present the error analysis of PL strategy by showing that the error is bounded by the confidence of PL threshold and consistency of multi-view prediction. Then, we theoretically illustrate the effect of PL on convergence property. Based on the analysis, we propose a cautious pseudo labeling methodology in which we pseudo label the samples with highest confidence and multi-view consistency. Finally, extensive experiments demonstrate that the proposed strategy improves graph learning process and outperforms other PL strategies on link prediction and node classification tasks.

NeurIPS Conference 2023 Conference Paper

GADBench: Revisiting and Benchmarking Supervised Graph Anomaly Detection

  • Jianheng Tang
  • Fengrui Hua
  • Ziqi Gao
  • Peilin Zhao
  • Jia Li

With a long history of traditional Graph Anomaly Detection (GAD) algorithms and recently popular Graph Neural Networks (GNNs), it is still not clear (1) how they perform under a standard comprehensive setting, (2) whether GNNs can outperform traditional algorithms such as tree ensembles, and (3) how about their efficiency on large-scale graphs. In response, we introduce GADBench---a benchmark tool dedicated to supervised anomalous node detection in static graphs. GADBench facilitates a detailed comparison across 29 distinct models on ten real-world GAD datasets, encompassing thousands to millions (~6M) nodes. Our main finding is that tree ensembles with simple neighborhood aggregation can outperform the latest GNNs tailored for the GAD task. We shed light on the current progress of GAD, setting a robust groundwork for subsequent investigations in this domain. GADBench is open-sourced at https: //github. com/squareRoot3/GADBench.

AAAI Conference 2023 Conference Paper

Handling Missing Data via Max-Entropy Regularized Graph Autoencoder

  • Ziqi Gao
  • Yifan Niu
  • Jiashun Cheng
  • Jianheng Tang
  • Lanqing Li
  • Tingyang Xu
  • Peilin Zhao
  • Fugee Tsung

Graph neural networks (GNNs) are popular weapons for modeling relational data. Existing GNNs are not specified for attribute-incomplete graphs, making missing attribute imputation a burning issue. Until recently, many works notice that GNNs are coupled with spectral concentration, which means the spectrum obtained by GNNs concentrates on a local part in spectral domain, e.g., low-frequency due to oversmoothing issue. As a consequence, GNNs may be seriously flawed for reconstructing graph attributes as graph spectral concentration tends to cause a low imputation precision. In this work, we present a regularized graph autoencoder for graph attribute imputation, named MEGAE, which aims at mitigating spectral concentration problem by maximizing the graph spectral entropy. Notably, we first present the method for estimating graph spectral entropy without the eigen-decomposition of Laplacian matrix and provide the theoretical upper error bound. A maximum entropy regularization then acts in the latent space, which directly increases the graph spectral entropy. Extensive experiments show that MEGAE outperforms all the other state-of-the-art imputation methods on a variety of benchmark datasets.

AAAI Conference 2023 Conference Paper

Human Mobility Modeling during the COVID-19 Pandemic via Deep Graph Diffusion Infomax

  • Yang Liu
  • Yu Rong
  • Zhuoning Guo
  • Nuo Chen
  • Tingyang Xu
  • Fugee Tsung
  • Jia Li

Non-Pharmaceutical Interventions (NPIs), such as social gathering restrictions, have shown effectiveness to slow the transmission of COVID-19 by reducing the contact of people. To support policy-makers, multiple studies have first modelled human mobility via macro indicators (e.g., average daily travel distance) and then study the effectiveness of NPIs. In this work, we focus on mobility modelling and, from a micro perspective, aim to predict locations that will be visited by COVID-19 cases. Since NPIs generally cause economic and societal loss, such a prediction benefits governments when they design and evaluate them. However, in real-world situations, strict privacy data protection regulations result in severe data sparsity problems (i.e., limited case and location information). To address these challenges and jointly model variables including a geometric graph, a set of diffusions and a set of locations, we propose a model named Deep Graph Diffusion Infomax (DGDI). We show the maximization of DGDI can be bounded by two tractable components: a univariate Mutual Information (MI) between geometric graph and diffusion representation, and a univariate MI between diffusion representation and location representation. To facilitate the research of COVID-19 prediction, we present two benchmarks that contain geometric graphs and location histories of COVID-19 cases. Extensive experiments on the two benchmarks show that DGDI significantly outperforms other competing methods.

ECAI Conference 2023 Conference Paper

Spectral Normalized-Cut Graph Partitioning with Fairness Constraints

  • Jia Li
  • Yanhao Wang 0001
  • Arpit Merchant

Normalized-cut graph partitioning aims to divide the set of nodes in a graph into k disjoint clusters to minimize the fraction of the total edges between any cluster and all other clusters. In this paper, we consider a fair variant of the partitioning problem wherein nodes are characterized by a categorical sensitive attribute (e. g. , gender or race) indicating membership to different demographic groups. Our goal is to ensure that each group is approximately proportionally represented in each cluster while minimizing the normalized cut value. To resolve this problem, we propose a two-phase spectral algorithm called FNM. In the first phase, we add an augmented Lagrangian term based on our fairness criteria to the objective function for obtaining a fairer spectral node embedding. Then, in the second phase, we design a rounding scheme to produce k clusters from the fair embedding that effectively trades off fairness and partition quality. Through comprehensive experiments on nine benchmark datasets, we demonstrate the superior performance of FNM compared with three baseline methods.

TMLR Journal 2023 Journal Article

StarCoder: may the source be with you!

  • Raymond Li
  • Loubna Ben allal
  • Yangtian Zi
  • Niklas Muennighoff
  • Denis Kocetkov
  • Chenghao Mou
  • Marc Marone
  • Christopher Akiki

The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process. We fine-tuned StarCoderBase on 35B Python tokens, resulting in the creation of StarCoder. We perform the most comprehensive evaluation of Code LLMs to date and show that StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model. Furthermore, StarCoder outperforms every model that is fine-tuned on Python and still retains its performance on other programming languages. We take several important steps towards a safe open-access model release, including an improved PII redaction pipeline and a novel attribution tracing tool, and make the StarCoder models publicly available under a more commercially viable version of the Open Responsible AI Model license.

TMLR Journal 2023 Journal Article

The Stack: 3 TB of permissively licensed source code

  • Denis Kocetkov
  • Raymond Li
  • Loubna Ben allal
  • Jia Li
  • Chenghao Mou
  • Yacine Jernite
  • Margaret Mitchell
  • Carlos Muñoz Ferrandis

Large Language Models (LLMs) play an ever-increasing role in the field of Artificial Intelligence (AI)--not only for natural language processing but also for code understanding and generation. To stimulate open and responsible research on LLMs for code, we introduce The Stack, a 3.1 TB dataset consisting of permissively licensed source code in 30 programming languages. We describe how we collect the full dataset, construct a permissively licensed subset, present a data governance plan, discuss limitations, and show promising results on text2code benchmarks by training 350M-parameter decoders on different Python subsets. We find that (1) near-deduplicating the data significantly boosts performance across all experiments, and (2) it is possible to match previously reported HumanEval and MBPP performance using only permissively licensed data. We make the dataset available at https://hf.co/BigCode, provide a tool called "Am I in The Stack" for developers to search The Stack for copies of their code (https://hf.co/spaces/bigcode/in-the-stack), and provide a process for code to be removed from the dataset.

AAAI Conference 2023 Conference Paper

Wiener Graph Deconvolutional Network Improves Graph Self-Supervised Learning

  • Jiashun Cheng
  • Man Li
  • Jia Li
  • Fugee Tsung

Graph self-supervised learning (SSL) has been vastly employed to learn representations from unlabeled graphs. Existing methods can be roughly divided into predictive learning and contrastive learning, where the latter one attracts more research attention with better empirical performance. We argue that, however, predictive models weaponed with powerful decoder could achieve comparable or even better representation power than contrastive models. In this work, we propose a Wiener Graph Deconvolutional Network (WGDN), an augmentation-adaptive decoder empowered by graph wiener filter to perform information reconstruction. Theoretical analysis proves the superior reconstruction ability of graph wiener filter. Extensive experimental results on various datasets demonstrate the effectiveness of our approach.

NeurIPS Conference 2022 Conference Paper

Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively

  • Haojie Zhang
  • Ge Li
  • Jia Li
  • Zhongjin Zhang
  • Yuqi Zhu
  • Zhi Jin

Large-scale pre-trained language models have achieved impressive results on a wide range of downstream tasks recently. However, fine-tuning an extremely large-scale pre-trained language model on limited target datasets is often plagued by overfitting and representation degradation. In this paper, we propose a Dynamic Parameter Selection (DPS) algorithm for the large-scale pre-trained models during fine-tuning, which adaptively selects a more promising subnetwork to perform staging updates based on gradients of back-propagation. Experiments on the GLUE benchmark show that DPS outperforms previous fine-tuning methods in terms of overall performance and stability, and consistently achieves better results with variable pre-trained language models. In addition, DPS brings a large magnitude of improvement in out-of-domain transferring experiments and low-resource scenarios, which shows that it can maintain stable general contextual features and reduce the representation collapse. We release our code at \url{https: //github. com/ZhangHaojie077/DPS}.

YNIMG Journal 2022 Journal Article

Growth charts of brain morphometry for preschool children

  • Hongxi Zhang
  • Jia Li
  • Xiaoli Su
  • Yang Hu
  • Tianmei Liu
  • Shaoqing Ni
  • Haifeng Li
  • Xi-Nian Zuo

Brain development from 1 to 6 years of age anchors a wide range of functional capabilities and carries early signs of neurodevelopmental disorders. However, quantitative models for depicting brain morphology changes and making individualized inferences are lacking, preventing the identification of early brain atypicality during this period. With a sample size of 285, we characterized the age dependence of the cortical thickness and subcortical volume in neurologically normal children and constructed quantitative growth charts of all brain regions for preschool children. While the cortical thickness of most brain regions decreased with age, the entorhinal and parahippocampal regions displayed an inverted-U shape of age dependence. Compared to the cortical thickness, the normalized volume of subcortical regions exhibited more divergent trends, with some regions increasing, some decreasing, and some displaying inverted-U-shaped trends. The growth curve models for all brain regions demonstrated utilities in identifying brain atypicality. The percentile measures derived from the growth curves facilitate the identification of children with developmental speech and language disorders with an accuracy of 0.875 (area under the receiver operating characteristic curve: 0.943). Our results fill the knowledge gap in brain morphometrics in a critical development period and provide an avenue for individualized brain developmental status evaluation with demonstrated sensitivity. The brain growth charts are shared with the public (http://phi-group.top/resources.html).

EAAI Journal 2022 Journal Article

Mechanical equipment health management method based on improved intuitionistic fuzzy entropy and case reasoning technology

  • Yupeng Gao
  • Ruixin Bao
  • Zhen Pan
  • Guiyang Ma
  • Jia Li
  • Xiuquan Cai
  • Qiqiang Peng

Management becomes more challenging as machinery becomes more widely used. From the health management history case records of mechanical devices, we can find that there are many health problems of the same type occurring repeatedly, but when similar problems occur again, solutions are not found in a short period. Case-based reasoning technology aimed at solving the above problems are widely used in equipment health management, but there are problems such as inadequate data utilization and failure to achieve the required accuracy and reliability. To uncover important information from historical failure case data and help reduce equipment downtime due to failure, this paper proposes a machinery equipment health management method based on improved intuitionistic fuzzy entropy and case reasoning technology. Firstly, the axiomatic definition of traditional intuitionistic fuzzy entropy is optimized and a new intuitionistic fuzzy entropy formula in the framework of case-based reasoning decision matrix is proposed. Secondly, the weight value of each attribute is obtained by combining the formula of the entropy weight method. Finally, the feature attribute weight values are fused into the case-based reasoning, and the historical cases that are most similar to the target cases are obtained by combining the existing case base. The effectiveness of the method was verified by comparing and analyzing the example calculation results with the traditional method. The method improves the management of machinery equipment operation and maintenance data. Furthermore, it helps enterprises to carry out the management and analysis of machinery equipment health problems.

AAAI Conference 2022 Conference Paper

Retinomorphic Object Detection in Asynchronous Visual Streams

  • Jianing Li
  • Xiao Wang
  • Lin Zhu
  • Jia Li
  • Tiejun Huang
  • Yonghong Tian

Due to high-speed motion blur and challenging illumination, conventional frame-based cameras have encountered an important challenge in object detection tasks. Neuromorphic cameras that output asynchronous visual streams instead of intensity frames, by taking the advantage of high temporal resolution and high dynamic range, have brought a new perspective to address the challenge. In this paper, we propose a novel problem setting, retinomorphic object detection, which is the first trial that integrates foveal-like and peripheral-like visual streams. Technically, we first build a large-scale multimodal neuromorphic object detection dataset (i. e. , PKU- Vidar-DVS) over 215. 5k spatio-temporal synchronized labels. Then, we design temporal aggregation representations to preserve the spatio-temporal information from asynchronous visual streams. Finally, we present a novel bio-inspired unifying framework to fuse two sensing modalities via a dynamic interaction mechanism. Our experimental evaluation shows that our approach has significant improvements over the stateof-the-art methods with the single-modality, especially in high-speed motion and low-light scenarios. We hope that our work will attract further research into this newly identified, yet crucial research direction. Our dataset can be available at https: //www. pkuml. org/resources/pku-vidar-dvs. html.

JMLR Journal 2021 Journal Article

A Fast Globally Linearly Convergent Algorithm for the Computation of Wasserstein Barycenters

  • Lei Yang
  • Jia Li
  • Defeng Sun
  • Kim-Chuan Toh

We consider the problem of computing a Wasserstein barycenter for a set of discrete probability distributions with finite supports, which finds many applications in areas such as statistics, machine learning and image processing. When the support points of the barycenter are pre-specified, this problem can be modeled as a linear programming (LP) problem whose size can be extremely large. To handle this large-scale LP, we analyse the structure of its dual problem, which is conceivably more tractable and can be reformulated as a well-structured convex problem with 3 kinds of block variables and a coupling linear equality constraint. We then adapt a symmetric Gauss-Seidel based alternating direction method of multipliers (sGS-ADMM) to solve the resulting dual problem and establish its global convergence and global linear convergence rate. As a critical component for efficient computation, we also show how all the subproblems involved can be solved exactly and efficiently. This makes our method suitable for computing a Wasserstein barycenter on a large-scale data set, without introducing an entropy regularization term as is commonly practiced. In addition, our sGS-ADMM can be used as a subroutine in an alternating minimization method to compute a barycenter when its support points are not pre-specified. Numerical results on synthetic data sets and image data sets demonstrate that our method is highly competitive for solving large-scale Wasserstein barycenter problems, in comparison to two existing representative methods and the commercial software Gurobi. [abs] [ pdf ][ bib ] &copy JMLR 2021. ( edit, beta )

NeurIPS Conference 2021 Conference Paper

Deconvolutional Networks on Graph Data

  • Jia Li
  • Jiajin Li
  • Yang Liu
  • Jianwei Yu
  • Yueting Li
  • Hong Cheng

In this paper, we consider an inverse problem in graph learning domain -- "given the graph representations smoothed by Graph Convolutional Network (GCN), how can we reconstruct the input graph signal? " We propose Graph Deconvolutional Network (GDN) and motivate the design of GDN via a combination of inverse filters in spectral domain and de-noising layers in wavelet domain, as the inverse operation results in a high frequency amplifier and may amplify the noise. We demonstrate the effectiveness of the proposed method on several tasks including graph feature imputation and graph structure generation.

AAAI Conference 2021 Conference Paper

Pyramidal Feature Shrinking for Salient Object Detection

  • Mingcan Ma
  • Changqun Xia
  • Jia Li

Recently, we have witnessed the great progress of salient object detection (SOD), which benefits from the effectiveness of various feature aggregation strategies. However, existing methods usually aggregate the low-level features containing details and the high-level features containing semantics over a large span, which introduces noise into the aggregated features and generates inaccurate saliency maps. In this paper, we propose a pyramidal feature shrinking network (PFSNet), which aims to aggregate adjacent feature nodes in pairs with layer-by-layer shrinkage, so that the aggregated features fuse effective details and semantics and discard interference information. Specifically, a pyramidal shrinking decoder (PSD) is proposed to aggregate adjacent features hierarchically in an asymptotic manner. Unlike other methods that aggregate features with significantly different information, this method only focuses on adjacent feature nodes in each layer and shrinks them to a final unique feature node. Besides, we propose an adjacent fusion module (AFM) to perform mutual spatial enhancement between the adjacent features to dynamically weight the features and adaptively fuse the appropriate information. Besides, a scale-aware enrichment module (SEM) based on the features extracted from the backbone is utilized to obtain rich scale information and generate diverse initial features with dilated convolutions. Extensive quantitative and qualitative experiments demonstrate that the proposed intuitive framework outperforms 14 state-of-the-art approaches on 5 public datasets.

NeurIPS Conference 2020 Conference Paper

Dirichlet Graph Variational Autoencoder

  • Jia Li
  • Jianwei Yu
  • Jiajin Li
  • Honglei Zhang
  • Kangfei Zhao
  • Yu Rong
  • Hong Cheng
  • Junzhou Huang

Graph Neural Networks (GNN) and Variational Autoencoders (VAEs) have been widely used in modeling and generating graphs with latent factors. However there is no clear explanation of what these latent factors are and why they perform well. In this work, we present Dirichlet Graph Variational Autoencoder (DGVAE) with graph cluster memberships as latent factors. Our study connects VAEs based graph generation and balanced graph cut, and provides a new way to understand and improve the internal mechanism of VAEs based graph generation. Specifically, we first interpret the reconstruction term of DGVAE as balanced graph cut in a principled way. Furthermore, motivated by the low pass characteristics in balanced graph cut, we propose a new variant of GNN named Heatts to encode the input graph into cluster memberships. Heatts utilizes the Taylor series for fast computation of Heat kernels and has better low pass characteristics than Graph Convolutional Networks (GCN). Through experiments on graph generation and graph clustering, we demonstrate the effectiveness of our proposed framework.

AAAI Conference 2020 Conference Paper

Local Search with Dynamic-Threshold Configuration Checking and Incremental Neighborhood Updating for Maximum k-plex Problem

  • Peilin Chen
  • Hai Wan
  • Shaowei Cai
  • Jia Li
  • Haicheng Chen

The Maximum k-plex Problem is an important combinatorial optimization problem with increasingly wide applications. In this paper, we propose a novel strategy, named Dynamicthreshold Configuration Checking (DCC), to reduce the cycling problem of local search. Due to the complicated neighborhood relations, all the previous local search algorithms for this problem spend a large amount of time in identifying feasible neighbors in each step. To further improve the performance on dense and challenging instances, we propose Double-attributes Incremental Neighborhood Updating (DINU) scheme which reduces the worst-case time complexity per iteration from O(|V | · ΔG) to O(k · ΔG). Based on DCC strategy and DINU scheme, we develop a local search algorithm named DCCplex. According to the experiment result, DCCplex shows promising result on DIMACS and BHOSLIB benchmark as well as real-world massive graphs. Especially, DCCplex updates the lower bound of the maximum k-plex for most dense and challenging instances.

AAAI Conference 2020 Conference Paper

Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

  • Jia Li
  • Wen Su
  • Zengfu Wang

We rethink a well-known bottom-up approach for multiperson pose estimation and propose an improved one. The improved approach surpasses the baseline significantly thanks to (1) an intuitional yet more sensible representation, which we refer to as body parts to encode the connection information between keypoints, (2) an improved stacked hourglass network with attention mechanisms, (3) a novel focal L2 loss which is dedicated to “hard” keypoint and keypoint association (body part) mining, and (4) a robust greedy keypoint assignment algorithm for grouping the detected keypoints into individual poses. Our approach not only works straightforwardly but also outperforms the baseline by about 15% in average precision and is comparable to the state of the art on the MS-COCO test-dev dataset. The code and pre-trained models are publicly available on our project page1.

AAAI Conference 2020 Conference Paper

Ultrafast Video Attention Prediction with Coupled Knowledge Distillation

  • Kui Fu
  • Peipei Shi
  • Yafei Song
  • Shiming Ge
  • Xiangju Lu
  • Jia Li

Large convolutional neural network models have recently demonstrated impressive performance on video attention prediction. Conventionally, these models are with intensive computation and large memory. To address these issues, we design an extremely light-weight network with ultrafast speed, named UVA-Net. The network is constructed based on depthwise convolutions and takes low-resolution images as input. However, this straight-forward acceleration method will decrease performance dramatically. To this end, we propose a coupled knowledge distillation strategy to augment and train the network effectively. With this strategy, the model can further automatically discover and emphasize implicit useful cues contained in the data. Both spatial and temporal knowledge learned by the high-resolution complex teacher networks also can be distilled and transferred into the proposed low-resolution light-weight spatiotemporal network. Experimental results show that the performance of our model is comparable to 11 state-of-the-art models in video attention prediction, while it costs only 0. 68 MB memory footprint, runs about 10, 106 FPS on GPU and 404 FPS on CPU, which is 206 times faster than previous models.

AAAI Conference 2019 Conference Paper

Lifted Proximal Operator Machines

  • Jia Li
  • Cong Fang
  • Zhouchen Lin

We propose a new optimization method for training feedforward neural networks. By rewriting the activation function as an equivalent proximal operator, we approximate a feedforward neural network by adding the proximal operators to the objective function as penalties, hence we call the lifted proximal operator machine (LPOM). LPOM is block multiconvex in all layer-wise weights and activations. This allows us to use block coordinate descent to update the layer-wise weights and activations. Most notably, we only use the mapping of the activation function itself, rather than its derivative, thus avoiding the gradient vanishing or blow-up issues in gradient based training methods. So our method is applicable to various non-decreasing Lipschitz continuous activation functions, which can be saturating and non-differentiable. LPOM does not require more auxiliary variables than the layer-wise activations, thus using roughly the same amount of memory as stochastic gradient descent (SGD) does. Its parameter tuning is also much simpler. We further prove the convergence of updating the layer-wise weights and activations and point out that the optimization could be made parallel by asynchronous update. Experiments on MNIST and CIFAR-10 datasets testify to the advantages of LPOM.

JMLR Journal 2017 Journal Article

Clustering with Hidden Markov Model on Variable Blocks

  • Lin Lin
  • Jia Li

Large-scale data containing multiple important rare clusters, even at moderately high dimensions, pose challenges for existing clustering methods. To address this issue, we propose a new mixture model called Hidden Markov Model on Variable Blocks (HMM-VB) and a new mode search algorithm called Modal Baum-Welch (MBW) for mode-association clustering. HMM-VB leverages prior information about chain-like dependence among groups of variables to achieve the effect of dimension reduction. In case such a dependence structure is unknown or assumed merely for the sake of parsimonious modeling, we develop a recursive search algorithm based on BIC to optimize the formation of ordered variable blocks. The MBW algorithm ensures the feasibility of clustering via mode association, achieving linear complexity in terms of the number of variable blocks despite the exponentially growing number of possible state sequences in HMM-VB. In addition, we provide theoretical investigations about the identifiability of HMM-VB as well as the consistency of our approach to search for the block partition of variables in a special case. Experiments on simulated and real data show that our proposed method outperforms other widely used methods. [abs] [ pdf ][ bib ] &copy JMLR 2017. ( edit, beta )

AAAI Conference 2017 Short Paper

Structured Prediction in Time Series Data

  • Jia Li

Time series data is common in a wide range of disciplines including finance, biology, sociology, and computer science. Analyzing and modeling time series data is fundamental for studying various problems in those fields. For instance, studying time series physiological data can be used to discriminate patientsÕ abnormal recovery trajectories and normal ones (Hripcsak, Albers, and Perotte 2015). GPS data are useful for studying collective decision making of groupliving animals (Strandburg-Peshkin et al. 2015). There are different methods for studying time series data such as clustering, regression, and anomaly detection. In this proposal, we are interested in structured prediction problems in time series data. Structured prediction focuses on prediction task where the outputs are structured and interdependent, contrary to the non-structured prediction which assumes that the outputs are independent of other predicted outputs. Structured prediction is an important problem as there are structures inherently existing in time series data. One difficulty for structured prediction is that the number of possible outputs can be exponential which makes modeling all the potential outputs intractable.

IJCAI Conference 2016 Conference Paper

Adversarial Sequence Tagging

  • Jia Li
  • Kaiser Asif
  • Hong Wang
  • Brian D. Ziebart
  • Tanya Berger-Wolf

Providing sequence tagging that minimize Hamming loss is a challenging, but important, task. Directly minimizing this loss over a training sample is generally an NP-hard problem. Instead, existing sequence tagging methods minimize a convex upper bound that upper bounds the Hamming loss. Unfortunately, this often either leads to inconsistent predictors (e. g. , max-margin methods) or predictions that are mismatched on the Hamming loss (e. g. , conditional random fields). We present adversarial sequence tagging, a consistent structured prediction framework for minimizing Hamming loss by pessimistically viewing uncertainty. Our approach pessimistically approximates the training data, yielding an adversarial game between the sequence tag predictor and the sequence labeler. We demonstrate the benefits of the approach on activity recognition and information extraction/segmentation tasks.

JMLR Journal 2008 Journal Article

Forecasting Web Page Views: Methods and Observations

  • Jia Li
  • Andrew W. Moore

Web sites must forecast Web page views in order to plan computer resource allocation and estimate upcoming revenue and advertising growth. In this paper, we focus on extracting trends and seasonal patterns from page view series, two dominant factors in the variation of such series. We investigate the Holt-Winters procedure and a state space model for making relatively short-term prediction. It is found that Web page views exhibit strong impulsive changes occasionally. The impulses cause large prediction errors long after their occurrences. A method is developed to identify impulses and to alleviate their damage on prediction. We also develop a long-range trend and season extraction method, namely the Elastic Smooth Season Fitting (ESSF) algorithm, to compute scalable and smooth yearly seasons. ESSF derives the yearly season by minimizing the residual sum of squares under smoothness regularization, a quadratic optimization problem. It is shown that for long-term prediction, ESSF improves accuracy significantly over other methods that ignore the yearly seasonality. [abs] [ pdf ][ bib ] &copy JMLR 2008. ( edit, beta )

IJCAI Conference 2007 Conference Paper

  • Ding Zhou
  • Levent Bolelli
  • Jia Li
  • Lee Giles
  • Hongyuan Zha

Machine learning for predicting user clicks in Web-based search offers automated explanation of user activity. We address click prediction in the Web search scenario by introducing a method for click prediction based on observations of past queries and the clicked documents. Due to the sparsity of the problem space, commonly encountered when learning for Web search, new approaches to learn the probabilistic relationship between documents and queries are proposed. Two probabilistic models are developed, which differ in the interpretation of the query-document co-occurrences. A novel technique, namely, conditional probability hierarchy, flexibly adjusts the level of granularity in parsing queries, and, as a result, leverages the advantages of both models.

JMLR Journal 2007 Journal Article

A Nonparametric Statistical Approach to Clustering via Mode Identification

  • Jia Li
  • Surajit Ray
  • Bruce G. Lindsay

A new clustering approach based on mode identification is developed by applying new optimization techniques to a nonparametric density estimator. A cluster is formed by those sample points that ascend to the same local maximum (mode) of the density function. The path from a point to its associated mode is efficiently solved by an EM-style algorithm, namely, the Modal EM (MEM). This method is then extended for hierarchical clustering by recursively locating modes of kernel density estimators with increasing bandwidths. Without model fitting, the mode-based clustering yields a density description for every cluster, a major advantage of mixture-model-based clustering. Moreover, it ensures that every cluster corresponds to a bump of the density. The issue of diagnosing clustering results is also investigated. Specifically, a pairwise separability measure for clusters is defined using the ridgeline between the density bumps of two clusters. The ridgeline is solved for by the Ridgeline EM (REM) algorithm, an extension of MEM. Based upon this new measure, a cluster merging procedure is created to enforce strong separation. Experiments on simulated and real data demonstrate that the mode-based clustering approach tends to combine the strengths of linkage and mixture-model-based clustering. In addition, the approach is robust in high dimensions and when clusters deviate substantially from Gaussian distributions. Both of these cases pose difficulty for parametric mixture modeling. A C package on the new algorithms is developed for public access at http://www.stat.psu.edu/∼jiali/hmac. [abs] [ pdf ][ bib ] &copy JMLR 2007. ( edit, beta )

IJCAI Conference 2005 Conference Paper

Semantic Argument Classification Exploiting Argument Interdependence

  • Zheng Ping Jiang
  • Jia Li
  • Hwee Tou

This paper describes our research on automatic semantic argument classification, using the PropBank data [Kingsbury et al. , 2002]. Previous research employed features that were based either on a full parse or shallow parse of a sentence. These features were mostly based on an individual semantic argument and the relation between the predicate and a semantic argument, but they did not capture the interdependence among all arguments of a predicate. In this paper, we propose the use of the neighboring semantic arguments of a predicate as additional features in determining the class of the current semantic argument. Our experimental results show significant improvement in the accuracy of semantic argument classification after exploiting argument interdependence. Argument classification accuracy on the standard Section 23 test set improves to 90. 50%, representing a relative error reduction of 18%.