Arrow Research search

Author name cluster

Jiacheng Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

16 papers
2 author rows

Possible papers

16

AAAI Conference 2026 Conference Paper

Bridging Modalities via Progressive Re-alignment for Multimodal Test-Time Adaptation

  • Jiacheng Li
  • Songhe Feng

Test-time adaptation (TTA) enables online model adaptation using only unlabeled test data, aiming to bridge the gap between source and target distributions. However, in multimodal scenarios, varying degrees of distribution shift across different modalities give rise to a complex coupling effect of unimodal shallow feature shift and cross-modal high-level semantic misalignment, posing a major obstacle to extending existing TTA methods to the multimodal field. To address this challenge, we propose a novel multimodal test-time adaptation (MMTTA) framework, termed as Bridging Modalities via Progressive Re-alignment (BriMPR). BriMPR, consisting of two progressively enhanced modules, tackles the coupling effect with a divide-and-conquer strategy. Specifically, we first decompose MMTTA into multiple unimodal feature alignment sub-problems. By leveraging the strong function approximation ability of prompt tuning, we calibrate the unimodal global feature distributions to their respective source distributions, so as to achieve the initial semantic re-alignment across modalities. Subsequently, we assign the credible pseudo-labels to combinations of masked and complete modalities, and introduce inter-modal instance-wise contrastive learning to further enhance the information interaction among modalities and refine the alignment. Extensive experiments on MMTTA tasks, including both corruption-based and real-world domain shift benchmarks, demonstrate the superiority of our method.

AAAI Conference 2026 Conference Paper

Cooperative Graph Transformer with Structural Consensus for Multi-View Learning

  • Zhiyuan Lai
  • Jiacheng Li
  • Jiayuan Wang
  • Shiping Wang

Multi-view learning aims to effectively integrate data from different sources by exploring the consistency and complementarity across views. Current multi-view methods based on Graph Convolutional Networks (GCNs) primarily focus on local information, making it difficult to capture global dependencies. Furthermore, multi-view data typically lack explicit structural representations, and the topologies constructed via node similarity in existing approaches are prone to noise, while simple fusion strategies are often inadequate for effectively suppressing this noise and for uncovering meaningful structural information. To tackle these issues, this paper proposes CoGFormer, a cooperative graph transformer with structural consensus learning. CoGFormer maps multi-view data into a unified space and jointly models local and global consensus: a denoising structural consensus graph convolutional network refines the consensus graph to enhance local consistency and robustness, while a structure-guided attention mechanism explicitly injects high-order cross-view structural biases to capture global consistency and improve semantic coherence. Experiments on multiple benchmarks demonstrate that CoGFormer outperforms existing state-of-the-art methods, validating its effectiveness.

AAAI Conference 2026 Conference Paper

Inductive Generative Recommendation via Retrieval-based Speculation

  • Yijie Ding
  • Jiacheng Li
  • Julian McAuley
  • Yupeng Hou

Generative recommendation (GR) is an emerging paradigm that tokenizes items into discrete tokens and learns to autoregressively generate the next tokens as predictions. While this token-generation paradigm is expected to surpass traditional transductive methods, potentially generating new items directly based on semantics, we empirically show that GR models predominantly generate items seen during training and struggle to recommend unseen items. In this paper, we propose SpecGR, a plug-and-play framework that enables GR models to recommend new items in an inductive setting. SpecGR uses a drafter model with inductive capability to propose candidate items, which may include both existing items and new items. The GR model then acts as a verifier, accepting or rejecting candidates while retaining its strong ranking capabilities. We further introduce the guided re-drafting technique to make the proposed candidates more aligned with the outputs of generative recommendation models, improving the verification efficiency. We consider two variants for drafting: (1) using an auxiliary drafter model for better flexibility, or (2) leveraging the GR model's own encoder for parameter-efficient self-drafting. Extensive experiments on three real-world datasets demonstrate that SpecGR exhibits both strong inductive recommendation ability and the best overall performance among the compared methods.

ICML Conference 2025 Conference Paper

Dequantified Diffusion-Schrödinger Bridge for Density Ratio Estimation

  • Wei Chen 0165
  • Shigui Li
  • Jiacheng Li
  • Junmei Yang
  • John Paisley
  • Delu Zeng

Density ratio estimation is fundamental to tasks involving f-divergences, yet existing methods often fail under significantly different distributions or inadequately overlapping supports — the density-chasm and the support-chasm problems. Additionally, prior approaches yield divergent time scores near boundaries, leading to instability. We design $\textbf{D}^3\textbf{RE}$, a unified framework for robust, stable and efficient density ratio estimation. We propose the dequantified diffusion bridge interpolant (DDBI), which expands support coverage and stabilizes time scores via diffusion bridges and Gaussian dequantization. Building on DDBI, the proposed dequantified Schrödinger bridge interpolant (DSBI) incorporates optimal transport to solve the Schrödinger bridge problem, enhancing accuracy and efficiency. Our method offers uniform approximation and bounded time scores in theory, and outperforms baselines empirically in mutual information and density estimation tasks.

NeurIPS Conference 2025 Conference Paper

Exploring Landscapes for Better Minima along Valleys

  • Tong Zhao
  • Jiacheng Li
  • Yuanchang Zhou
  • Guangming Tan
  • Weile Jia

Finding lower and better-generalizing minima is crucial for deep learning. However, most existing optimizers stop searching the parameter space once they reach a local minimum. Given the complex geometric properties of the loss landscape, it is difficult to guarantee that such a point is the lowest or provides the best generalization. To address this, we propose an adaptor "E" for gradient-based optimizers. The adapted optimizer tends to continue exploring along landscape valleys (areas with low and nearly identical losses) in order to search for potentially better local minima even after reaching a local minimum. This approach increases the likelihood of finding a lower and flatter local minimum, which is often associated with better generalization. We also provide a proof of convergence for the adapted optimizers in both convex and non-convex scenarios for completeness. Finally, we demonstrate their effectiveness in an important but notoriously difficult training scenario, large-minibatch training, where Lamb is the benchmark optimizer. Our testing results show that the adapted Lamb, ALTO, increases the test accuracy (generalization) of the current state-of-the-art optimizer by an average of 2. 5\% across a variety of large-batch training tasks. This work potentially opens a new research direction in the design of optimization algorithms.

EAAI Journal 2025 Journal Article

Fault diagnosis of driving gear in battery swapping system based on auditory bionics

  • Hang Yuan
  • Hao Wu
  • Jiacheng Li
  • Kai Zhang
  • Huijuan Zhang
  • Xiaowen You
  • Xianglong You

Rack and pinion drives (RPD) are widely used in battery swapping system (BSS) for electric heavy trucks (EHT), and due to the continuous heavy-load and high-intensity operation, along with the electric erosion, the gears in the RPD are always damaged, which causes unexpected consequences such as downtime or safety incidents. The working conditions of the RPD in BSS include uncertain noises, fluctuant and low speed, which pose steep challenges to accurate fault diagnosis. Considering the auditory resistance of interference, the low-frequency sensitivity of auditory perception, and the auditory saliency mechanism, to leverage the advantages of auditory perceptual mechanism in addressing the above challenges, as the contribution in artificial intelligence, we propose an entire vibration signal processing scheme based on auditory bionics, including some mathematical models for auditory mechanisms. For the application in engineering, the proposed scheme is employed for fault diagnosis of RPD in BSS in unique working conditions. First, adaptive resampling is used to smooth the speed fluctuation, then, Gammatone filters are employed to transform vibration signals to cochleograms, after that, based on auditory stream segregation and selective attention mechanisms, effective frequency channels and salient features are extracted from the cochleograms, besides, to improve the diagnosis accuracy, binaural features are also extracted, finally, based on (sectional) sparse representation and fusion, fault diagnosis is achieved. The effectiveness of the fault diagnosis scheme is demonstrated using a BSS prototype system.

TMLR Journal 2025 Journal Article

Preference Discerning with LLM-Enhanced Generative Retrieval

  • Fabian Paischer
  • Liu Yang
  • Linfeng Liu
  • Shuai Shao
  • Kaveh Hassani
  • Jiacheng Li
  • Ricky T. Q. Chen
  • Zhang Gabriel Li

In sequential recommendation, models recommend items based on user's interaction history. To this end, current models usually incorporate information such as item descriptions and user intent or preferences. User preferences are usually not explicitly given in open-source datasets, and thus need to be approximated, for example via large language models (LLMs). Current approaches leverage approximated user preferences only during training and rely solely on the past interaction history for recommendations, limiting their ability to dynamically adapt to changing preferences, potentially reinforcing echo chambers. To address this issue, we propose a new paradigm, namely *preference discerning*, which explicitly conditions a generative recommendation model on user preferences in natural language within its context. To evaluate *preference discerning*, we introduce a novel benchmark that provides a holistic evaluation across various scenarios, including preference steering and sentiment following. Upon evaluating current state-of-the-art methods on our benchmark, we discover that their ability to dynamically adapt to evolving user preferences is limited. To address this, we propose a new method named Mender (**M**ultimodal Prefer**en**ce **D**iscern**er**), which achieves state-of-the-art performance in our benchmark. Our results show that Mender effectively adapts its recommendation guided by human preferences, even if not observed during training, paving the way toward more flexible recommendation models.

NeurIPS Conference 2025 Conference Paper

Seg2Any: Open-set Segmentation-Mask-to-Image Generation with Precise Shape and Semantic Control

  • Danfeng Li
  • Hui Zhang
  • Sheng Wang
  • Jiacheng Li
  • Zuxuan Wu

Despite recent advances in diffusion models, top-tier text-to-image (T2I) models still struggle to achieve precise spatial layout control, i. e. accurately generating entities with specified attributes and locations. Segmentation-mask-to-image (S2I) generation has emerged as a promising solution by incorporating pixel-level spatial guidance and regional text prompts. However, existing S2I methods fail to simultaneously ensure semantic consistency and shape consistency. To address these challenges, we propose Seg2Any, a novel S2I framework built upon advanced multimodal diffusion transformers ( e. g. FLUX). First, to achieve both semantic and shape consistency, we decouple segmentation mask conditions into regional semantic and high-frequency shape components. The regional semantic condition is introduced by a Semantic Alignment Attention Mask, ensuring that generated entities adhere to their assigned text prompts. The high-frequency shape condition, representing entity boundaries, is encoded as an Entity Contour Map and then introduced as an additional modality via multi-modal attention to guide image spatial structure. Second, to prevent attribute leakage across entities in multi-entity scenarios, we introduce an Attribute Isolation Attention Mask mechanism, which constrains each entity’s image tokens to attend exclusively to themselves during image self-attention. To support open-set S2I generation, we construct SACap-1M, a large-scale dataset containing 1 million images with 5. 9 million segmented entities and detailed regional captions, along with a SACap-Eval benchmark for comprehensive S2I evaluation. Extensive experiments demonstrate that Seg2Any achieves state-of-the-art performance on both open-set and closed-set S2I benchmarks, particularly in fine-grained spatial and attribute control of entities.

TMLR Journal 2025 Journal Article

Unifying Generative and Dense Retrieval for Sequential Recommendation

  • Liu Yang
  • Fabian Paischer
  • Kaveh Hassani
  • Jiacheng Li
  • Shuai Shao
  • Zhang Gabriel Li
  • Yun He
  • Xue Feng

Sequential dense retrieval models utilize advanced sequence learning techniques to compute item and user representations, which are then used to rank relevant items for a user through inner product computation between the user and all item representations. While effective, these approaches incur high memory and computational costs due to the need to store and compare a unique embedding for each item--leading to lower resource efficiency. In contrast, the recently proposed generative retrieval paradigm offers a promising alternative by directly predicting item indices using a generative model trained on semantic IDs that encapsulate items’ semantic information. Despite its potential for large-scale applications, a comprehensive comparison between generative retrieval and sequential dense retrieval under fair conditions is still lacking, leaving open questions regarding performance and resource efficiency trade-offs. To address this, we compare these two approaches under controlled conditions on academic benchmarks and observe performance gaps, with dense retrieval showing stronger ranking performance, while generative retrieval provides greater resource efficiency. Motivated by these observations, we propose LIGER (LeveragIng dense retrieval for GEnerative Retrieval), a hybrid model that combines the strengths of these two widely used approaches. LIGER integrates sequential dense retrieval into generative retrieval, mitigating performance differences between the two methods, and enhancing cold-start item recommendation in the evaluated datasets. This hybrid approach provides insight into the trade-offs between these approaches and demonstrates improvements in efficiency and effectiveness for recommendation systems in small-scale benchmarks.

JBHI Journal 2024 Journal Article

Generative Listener EEG for Speech Emotion Recognition Using Generative Adversarial Networks With Compressed Sensing

  • Jiang Chang
  • Zhixin Zhang
  • Zelin Wang
  • Jiacheng Li
  • Linsheng Meng
  • Pan Lin

Currently, emotional features in speech emotion recognition are typically extracted from the speeches, However, recognition accuracy can be influenced by factors such as semantics, language, and cross-speech datasets. Achieving consistent emotional judgment with human listeners is a key challenge for AI to address. Electroencephalography (EEG) signals prove to be an effective means of capturing authentic and meaningful emotional information in humans. This positions EEG as a promising tool for detecting emotional cues conveyed in speech. In this study, we proposed a novel approach named CS-GAN that generates listener EEGs in response to a speaker's speech, specifically aimed at enhancing cross-subject emotion recognition. We utilized generative adversarial networks (GANs) to establish a mapping relationship between speech and EEGs to generate stimulus-induced EEGs. Furthermore, we integrated compressive sensing theory (CS) into the GAN-based EEG generation method, thereby enhancing the fidelity and diversity of the generated EEGs. The generated EEGs were then processed using a CNN-LSTM model to identify the emotional categories conveyed in the speech. By averaging these EEGs, we obtained the event-related potentials (ERPs) to improve the cross-subject capability of the method. The experimental results demonstrate that the generated EEGs by this method outperform real listener EEGs by 9. 31% in cross-subject emotion recognition tasks. Furthermore, the ERPs show an improvement of 43. 59%, providing evidence for the effectiveness of this method in cross-subject emotion recognition.

IROS Conference 2024 Conference Paper

Novel design of Reconfigurable Tracked Robot with Geometry-Changing Tracks

  • Chice Xuan
  • Jiadong Luy
  • Zhihao Tian
  • Jiacheng Li
  • Mengke Zhang
  • Hanbin Xie
  • Jianxiong Qiu
  • Chao Xu 0001

Tracked robots with reconfigurable mechanisms exhibit great maneuverability due to their adaptability to complex ground conditions. Reconfigurable tracked robots with geometry-changing tracks show further obstacle-crossing capabilities with compact dimensions. However, existing systems face deployment limitations due to either complex transmission mechanisms or unsustainable designs when maintaining the tension in the tracks. To address these challenges, we introduce a novel design of a reconfigurable tracked robot with geometry-changing tracks, which achieves strong terrain traversability with good mechanical properties. We achieve the elliptical trajectory of key planetary wheels through a novel Quad-slider Elliptical Trammel Mechanism (Qs-ETM), allowing the tracks to maintain fixed tension while changing their geometry. Furthermore, the combination of direct drive motors significantly enhances its mechanical properties and agility. A detailed analysis of the kinematic and dynamic characteristics has been conducted and proved with a series of simulations. We built a fully functional prototype of the design and tested it in real-world experiments to validate its advantages. The result shows that our design can reduce the torque required by up to 68. 3% and the shear stress of the flipper by up to 67. 1%.

ICLR Conference 2023 Conference Paper

Effective passive membership inference attacks in federated learning against overparameterized models

  • Jiacheng Li
  • Ninghui Li 0001
  • Bruno Ribeiro 0001

This work considers the challenge of performing membership inference attacks in a federated learning setting ---for image classification--- where an adversary can only observe the communication between the central node and a single client (a passive white-box attack). Passive attacks are one of the hardest-to-detect attacks, since they can be performed without modifying how the behavior of the central server or its clients, and assumes *no access to private data instances*. The key insight of our method is empirically observing that, near parameters that generalize well in test, the gradient of large overparameterized neural network models statistically behave like high-dimensional independent isotropic random vectors. Using this insight, we devise two attacks that are often little impacted by existing and proposed defenses. Finally, we validated the hypothesis that our attack depends on the overparametrization by showing that increasing the level of overparametrization (without changing the neural network architecture) positively correlates with our attack effectiveness.

AAAI Conference 2023 Conference Paper

PrimeNet: Pre-training for Irregular Multivariate Time Series

  • Ranak Roy Chowdhury
  • Jiacheng Li
  • Xiyuan Zhang
  • Dezhi Hong
  • Rajesh K. Gupta
  • Jingbo Shang

Real-world applications often involve irregular time series, for which the time intervals between successive observations are non-uniform. Irregularity across multiple features in a multi-variate time series further results in a different subset of features at any given time (i.e., asynchronicity). Existing pre-training schemes for time-series, however, often assume regularity of time series and make no special treatment of irregularity. We argue that such irregularity offers insight about domain property of the data—for example, frequency of hospital visits may signal patient health condition—that can guide representation learning. In this work, we propose PrimeNet to learn a self-supervised representation for irregular multivariate time-series. Specifically, we design a time sensitive contrastive learning and data reconstruction task to pre-train a model. Irregular time-series exhibits considerable variations in sampling density over time. Hence, our triplet generation strategy follows the density of the original data points, preserving its native irregularity. Moreover, the sampling density variation over time makes data reconstruction difficult for different regions. Therefore, we design a data masking technique that always masks a constant time duration to accommodate reconstruction for regions of different sampling density. We learn with these tasks using unlabeled data to build a pre-trained model and fine-tune on a downstream task with limited labeled data, in contrast with existing fully supervised approach for irregular time-series, requiring large amounts of labeled data. Experiment results show that PrimeNet significantly outperforms state-of-the-art methods on naturally irregular and asynchronous data from Healthcare and IoT applications for several downstream tasks, including classification, interpolation, and regression.

IJCAI Conference 2023 Conference Paper

SmartBERT: A Promotion of Dynamic Early Exiting Mechanism for Accelerating BERT Inference

  • Boren Hu
  • Yun Zhu
  • Jiacheng Li
  • Siliang Tang

Dynamic early exiting has been proven to improve the inference speed of the pre-trained language model like BERT. However, all samples must go through all consecutive layers before early exiting and more complex samples usually go through more layers, which still exists redundant computation. In this paper, we propose a novel dynamic early exiting combined with layer skipping for BERT inference named SmartBERT, which adds a skipping gate and an exiting operator into each layer of BERT. SmartBERT can adaptively skip some layers and adaptively choose whether to exit. Besides, we propose cross-layer contrastive learning and combine it into our training phases to boost the intermediate layers and classifiers which would be beneficial for early exiting. To keep the inconsistent usage of skipping gates between training and inference phases, we propose a hard weight mechanism during training phase. We conduct experiments on eight classification datasets of the GLUE benchmark. Experimental results show that SmartBERT achieves 2-3× computation reduction with minimal accuracy drops compared with BERT and our method outperforms previous methods in both efficiency and accuracy. Moreover, in some complex datasets, we prove that the early exiting based on entropy hardly works, and the skipping mechanism is essential for reducing computation.

EAAI Journal 2023 Journal Article

Unmanned aerial vehicle remote sensing image registration based on an improved oriented FAST and rotated BRIEF- random sample consensus algorithm

  • Fuzhen Zhu
  • Huiling Li
  • Jiacheng Li
  • Bing Zhu
  • Siwen Lei

Unmanned Aerial Vehicle (UAV) remote sensing image registration is the key step of remote sensing image stitching, image fusion and multi-frames image super-resolution. Its speed and accuracy determines the effect of remote sensing image applications, such as object detection, environment monitoring. To meet the speed and accuracy requirements of UAV remote sensing image registration, an improved Oriented FAST and Rotated BRIEF- Random sample consensus (ORB-RANSAC) algorithm is proposed. Firstly, images to be registered are divided into non-overlapping sub-images, and then a simplified image pyramid is constructed for these sub-images to get scale invariance. Secondly, the traditional FAST corner detection algorithm is improved by setting the adaptive corner detection threshold, and more feature points are detected. Meanwhile, the traditional quadtree algorithm is improved to remove redundant feature points and keep the remaining high-quality feature points. Thirdly, feature points coarse matching is done by bidirectional matching combined with cosine similarity method. Finally, the improved RANSAC algorithm is used for feature points fine matching to eliminate mismatches and calculate the transformation matrix. Experiment results show that, comparing with the traditional ORB algorithm, the number of feature points detected is significantly increased and its distribution is more uniform, and the correct matching rate increases by 58. 10% in the case of image scale changing. Comparing with the state-of-the-art UAV remote sensing image registration algorithm, the correct matching rate and mutual information of our method are increased by 0. 68% and 1. 91% respectively, matching time and root mean square error are reduced by 3. 89% and 11. 2% respectively.

AAAI Conference 2020 Short Paper

Who Are Controlled by The Same User? Multiple Identities Deception Detection via Social Interaction Activity (Student Abstract)

  • Jiacheng Li
  • Chunyuan Yuan
  • Wei Zhou
  • Jingli Wang
  • Songlin Hu

Social media has become a preferential place for sharing information. However, some users may create multiple accounts and manipulate them to deceive legitimate users. Most previous studies utilize verbal or behavior features based methods to solve this problem, but they are only designed for some particular platforms, leading to low universalness. In this paper, to support multiple platforms, we construct interaction tree for each account based on their social interactions which is common characteristic of social platforms. Then we propose a new method to calculate the social interaction entropy of each account and detect the accounts which are controlled by the same user. Experimental results on two real-world datasets show that the method has robust superiority over state-of-the-art methods.