Author name cluster

Yu Lei

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers

2 author rows

AAAI Conference 2026 Conference Paper

Do Large Language Models Think like the Brain? Sentence-Level Evidences from Layer-Wise Embeddings and fMRI

Yu Lei
Xingyang Ge
Yi Zhang
Yiming Yang
Bolei Ma

Understanding whether large language models (LLMs) and the human brain converge on similar computational principles remains a fundamental and important question in cognitive neuroscience and AI. Do the brain-like patterns observed in LLMs emerge simply from scaling, or do they reflect deeper alignment with the architecture of human language processing? This study focuses on the sentence-level neural mechanisms of language models, systematically investigating how layer-wise representations in LLMs align with the dynamic neural responses during human sentence comprehension. By comparing hierarchical embeddings from 14 publicly available LLMs with fMRI data collected from participants, who were exposed to a naturalistic narrative story, we constructed sentence-level neural prediction models to identify the model layers most significantly correlated with brain region activations. Results show that improvements in model performance drive the evolution of representational architectures toward brain-like hierarchies, particularly achieving stronger functional and anatomical correspondence at higher semantic abstraction levels. These findings advance our understanding of the computational parallels between LLMs and the human brain, highlighting the potential of LLMs as models for human language processing.

PDF Details DOI

EAAI Journal 2026 Journal Article

Knowledge-data driven digital twin platform: intelligent prediction and control of tunnel face stability during large-diameter slurry shield construction

Xianguo Wu
Yu Lei
Feiming Su
Tiejun Li
Yang Liu

Large diameter shield construction (LDSS) safety is particularly important. To ensure safe excavation in LDSS projects, this paper proposed a digital twin (DT) platform integrated with knowledge-data driven method to achieve prediction and control of tunnel face stability (TFS) during LDSS construction. The DT platform enables acquisition of physical data and expert knowledge, facilitating bidirectional synchronization between physical entities and virtual counterparts. DT platform utilizes Bayesian Optimization (BO), Graph Convolutional Network (GCN), Bidirectional Long Short-Term Memory (BiLSTM) networks, and SHapley Additive exPlanations (SHAP), driven by physical data, together with expert knowledge, to achieve knowledge-data driven prediction and control of Tunnel Face Stability (TFS) during LDSS construction. A case study of Wuhan Metro Line 12 construction demonstrates the platform's effectiveness, with key findings revealing: (1) The DT platform achieves accurate TFS prediction across six Geological types, showing average R2 values of 0. 935 and root mean square error (RMSE) of 0. 239. (2) Key construction parameters are identified through the knowledge-data driven method, including air chamber pressure, grouting pressure, advance rate, cutterhead rotation speed, and cutterhead torque. (3) The DT platform enables control of the TFS during LDSS construction by maintaining the slurry pressure (SP) within an optimal range. The DT platform developed in this study fills TFS research gaps in LDSS construction and provides valuable references for similar complex projects.

Details DOI

AAAI Conference 2026 Conference Paper

QueryCraft: Transformer-Guided Query Initialization for Enhanced Human-Object Interaction Detection

Yuxiao Wang
Wolin Liang
Yu Lei
Weiying Xue
Nan Zhuang
Qi Liu

Human-Object Interaction (HOI) detection aims to localize human-object pairs and recognize their interactions in images. Although DETR-based methods have recently emerged as the mainstream framework for HOI detection, they still suffer from a key limitation: Randomly initialized queries lack explicit semantics, leading to suboptimal detection performance. To address this challenge, we propose QueryCraft, a novel plug-and-play HOI detection framework that incorporates semantic priors and guided feature learning through transformer-based query initialization. Central to our approach is ACTOR (Action-aware Cross-modal TransfORmer), a cross-modal Transformer encoder that jointly attends to visual regions and textual prompts to extract action-relevant features. Rather than merely aligning modalities, ACTOR leverages language-guided attention to infer interaction semantics and produce semantically meaningful query representations. To further enhance object-level query quality, we introduce a Perceptual Distilled Query Decoder (PDQD), which distills object category awareness from a pre-trained detector to serve as object query initiation. This dual-branch query initialization enables the model to generate more interpretable and effective queries for HOI detection. Extensive experiments on HICO-Det and V-COCO benchmarks demonstrate that our method achieves state-of-the-art performance and strong generalization.

PDF Details DOI

AAAI Conference 2026 Conference Paper

What-Meets-Where: Unified Learning of Action and Contact Localization in Images

Yuxiao Wang
Yu Lei
Wolin Liang
Weiying Xue
Zhenao Wei
Nan Zhuang
Qi Liu

People control their bodies to establish contact with the environment. To comprehensively understand actions across diverse visual contexts, it is essential to simultaneously consider what action is occurring and where it is happening. Current methodologies, however, often inadequately capture this duality, typically failing to jointly model both action semantics and their spatial contextualization within scenes. To bridge this gap, we introduce a novel vision task that simultaneously predicts high-level action semantics and fine-grained body-part contact regions. Our proposed framework, PaIR-Net, comprises three key components: the Contact Prior Aware Module (CPAM) for identifying contact-relevant body parts, the Prior-Guided Concat Segmenter (PGCS) for pixel-wise contact segmentation, and the Interaction Inference Module (IIM) responsible for integrating global interaction relationships. To facilitate this task, we present PaIR (Part-aware Interaction Representation), a comprehensive dataset containing 13,979 images that encompass 654 actions, 80 object categories, and 17 body parts. Experimental evaluation demonstrates that PaIR-Net significantly outperforms baseline approaches, while ablation studies confirm the efficacy of each architectural component.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Correcting Large Language Model Behavior via Influence Function

Han Zhang
Zhuo Zhang
Yi Zhang
Yuanzhao Zhai
Hanyang Peng
Yu Lei
Yue Yu
Hui Wang

Recent advancements in AI alignment techniques have significantly improved the alignment of large language models (LLMs) with static human preferences. However, the dynamic nature of human preferences can render some prior training data outdated or even erroneous, ultimately causing LLMs to deviate from contemporary human preferences and societal norms. Existing methodologies, either curation of new data for continual alignment or manual correction of outdated data for re-alignment, demand costly human resources. To address this, we propose a novel approach, LLM BehAvior Correction with INfluence FunCtion REcall and Post-Training (LANCET), which needs no human involvement. LANCET consists of two phases: (1) using a new method LinFAC to efficiently identify the training data that significantly impact undesirable model outputs, and (2) applying an novel Influence-driven Bregman Optimization (IBO) technique to adjust the model’s outputs based on these influence distributions. Our experiments show that LANCET effectively and efficiently corrects inappropriate behaviors of LLMs while preserving model utility. Further more, LANCET exhibits stronger generalization ability than all baselines under out-of-distribution harmful prompts, offering better interpretability and compatibility with real-world applications of LLMs.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Precision-Enhanced Human-Object Contact Detection via Depth-Aware Perspective Interaction and Object Texture Restoration

Yuxiao Wang
Wenpeng Neng
Zhenao Wei
Yu Lei
Weiying Xue
Nan Zhuang
Yanwu Xu
Xinyu Jiang

Human-object contact (HOT) is designed to accurately identify the areas where humans and objects come into contact. Current methods frequently fail to account for scenarios where objects are frequently blocking the view, resulting in inaccurate identification of contact areas. To tackle this problem, we suggest using a perspective interaction HOT detector called PIHOT, which utilizes a depth map generation model to offer depth information of humans and objects related to the camera, thereby preventing false interaction detection. Furthermore, we use mask dilatation and object restoration techniques to restore the texture details in covered areas, improve the boundaries between objects, and enhance the perception of humans interacting with objects. Moreover, a spatial awareness perception is intended to concentrate on the characteristic features close to the points of contact. The experimental results show that the PIHOT algorithm achieves state-of-the-art performance on three benchmark datasets for HOT detection tasks. Compared to the most recent DHOT, our method enjoys an average improvement of 13%, 27.5%, 16%, and 18.5% on SC-Acc., C-Acc., mIoU, and wIoU metrics, respectively.

PDF Details DOI

EAAI Journal 2025 Journal Article

Reference-based image super-resolution of hyperspectral and red-green-blue image for determination of wheat kernel quality using deep learning networks

Shizhuang Weng
Qiaoqiao Zhang
Kaixuan Han
Meijing Pan
Yujian Tan
Qun Chen
Feihong Wu
Cong Wang

In the process of cultivation and harvest, wheat kernel quality is highly susceptible to various factors, such as disease, mildew, atrophy and impurities, and detection of kernel quality is essential to avoid hazard proliferation, facilitate product grading, and ensure food safety. Possessing abundant image and spectral characteristics, hyperspectral imaging (HSI) has gained impressive achievements in kernel quality analysis, but its low spatial resolution limits its detection accuracy. In this study, reference-based image super-resolution (RefSR) of HSI and Red-Green-Blue image was adopted to improve resolution to determine wheat kernel quality using deep learning networks. Firstly, RefSR was conducted by the improved transformer network with dual-branch feature extraction and weighted fusion operation and achieved excellent RefSR with significant resolution improvement, peak signal to noise ratio of 35. 521 and structural similarity index of 0. 97, outweighing the existing state-of-the-art networks. Then, the reflectance images (RIs) of effective wavelengths (EWs) from generated HSI images were combined with the residual network with a spatial, channel attention and multi-scale residual to determine wheat kernel quality. Precise analysis was achieved with the accuracy in calibration, validation and prediction sets of 100. 00%, 95. 26% and 92. 78%. RefSR provides a novel and efficient approach for obtaining HSI images of high spatial resolution and facilitates the application of HSI in analysis of crop kernels. RIs of several sporadic EWs can be easily acquired and processed, achieving field and rapid kernel detection. Therefore, the proposed method furnishes the efficient, accurate and applicable determination of wheat kernel quality.

Details DOI

ICLR Conference 2024 Conference Paper

CPPO: Continual Learning for Reinforcement Learning with Human Feedback

Han Zhang 0025
Yu Lei
Lin Gui 0003
Min Yang 0007
Yulan He 0001
Hui Wang 0030
Ruifeng Xu 0001

The approach of Reinforcement Learning from Human Feedback (RLHF) is widely used for enhancing pre-trained Language Models (LM), enabling them to better align with human preferences. Existing RLHF-based LMs however require complete retraining whenever new queries or feedback are introduced, as human preferences may differ across different domains or topics. LM retraining is of ten impracticable in most real-world scenarios, due to the substantial time and computational costs involved, as well as data privacy concerns. To address this limitation, we propose Continual Proximal Policy Optimization (CPPO), a novel method that is able to continually align LM with dynamic human preferences. Specifically, CPPO adopts a weighting strategy to decide which samples should be utilized for enhancing policy learning and which should be used for solidifying past experiences. This seeks a good trade-off between policy learning and knowledge retention. Our experimental results show that CPPO outperforms strong Contin uous learning (CL) baselines when it comes to consistently aligning with human preferences. Furthermore, compared to PPO, CPPO offers more efficient and stable learning in non-continual scenarios.

Details

AAAI Conference 2024 Conference Paper

FocalDreamer: Text-Driven 3D Editing via Focal-Fusion Assembly

Yuhan Li
Yishun Dou
Yue Shi
Yu Lei
Xuanhong Chen
Yi Zhang
Peng Zhou
Bingbing Ni

While text-3D editing has made significant strides in leveraging score distillation sampling, emerging approaches still fall short in delivering separable, precise and consistent outcomes that are vital to content creation. In response, we introduce FocalDreamer, a framework that merges base shape with editable parts according to text prompts for fine-grained editing within desired regions. Specifically, equipped with geometry union and dual-path rendering, FocalDreamer assembles independent 3D parts into a complete object, tailored for convenient instance reuse and part-wise control. We propose geometric focal loss and style consistency regularization, which encourage focal fusion and congruent overall appearance. Furthermore, FocalDreamer generates high-fidelity geometry and PBR textures which are compatible with widely-used graphics engines. Extensive experiments have highlighted the superior editing capabilities of FocalDreamer in both quantitative and qualitative evaluations.

PDF Details DOI

NeurIPS Conference 2022 Conference Paper

A Lower Bound of Hash Codes' Performance

Xiaosu Zhu
Jingkuan Song
Yu Lei
Lianli Gao
Hengtao Shen

As a crucial approach for compact representation learning, hashing has achieved great success in effectiveness and efficiency. Numerous heuristic Hamming space metric learning objectives are designed to obtain high-quality hash codes. Nevertheless, a theoretical analysis of criteria for learning good hash codes remains largely unexploited. In this paper, we prove that inter-class distinctiveness and intra-class compactness among hash codes determine the lower bound of hash codes' performance. Promoting these two characteristics could lift the bound and improve hash learning. We then propose a surrogate model to fully exploit the above objective by estimating the posterior of hash codes and controlling it, which results in a low-bias optimization. Extensive experiments reveal the effectiveness of the proposed method. By testing on a series of hash-models, we obtain performance improvements among all of them, with an up to $26. 5\%$ increase in mean Average Precision and an up to $20. 5\%$ increase in accuracy. Our code is publicly available at https: //github. com/VL-Group/LBHash.

PDF Details

IJCAI Conference 2020 Conference Paper

An Interactive Multi-Task Learning Framework for Next POI Recommendation with Uncertain Check-ins

Lu Zhang
Zhu Sun
Jie Zhang
Yu Lei
Chen Li
Ziqing Wu
Horst Kloeden
Felix Klanner

Studies on next point-of-interest (POI) recommendation mainly seek to learn users' transition patterns with certain historical check-ins. However, in reality, users' movements are typically uncertain (i. e. , fuzzy and incomplete) where most existing methods suffer from the transition pattern vanishing issue. To ease this issue, we propose a novel interactive multi-task learning (iMTL) framework to better exploit the interplay between activity and location preference. Specifically, iMTL introduces: (1) temporal-aware activity encoder equipped with fuzzy characterization over uncertain check-ins to unveil the latent activity transition patterns; (2) spatial-aware location preference encoder to capture the latent location transition patterns; and (3) task-specific decoder to make use of the learned latent transition patterns and enhance both activity and location prediction tasks in an interactive manner. Extensive experiments on three real-world datasets show the superiority of iMTL.

PDF Details DOI

YNIMG Journal 2015 Journal Article

Nature of functional links in valuation networks differentiates impulsive behaviors between abstinent heroin-dependent subjects and nondrug-using subjects

Tianye Zhai
Yongcong Shao
Gang Chen
Enmao Ye
Lin Ma
Lubin Wang
Yu Lei
Guangyu Chen

Advanced neuroimaging studies have identified brain correlates of pathological impulsivity in a variety of neuropsychiatric disorders. However, whether and how these spatially separate and functionally integrated neural correlates collectively contribute to aberrant impulsive behaviors remains unclear. Building on recent progress in neuroeconomics toward determining a biological account of human behaviors, we employed resting-state functional MRI to characterize the nature of the links between these neural correlates and to investigate their impact on impulsivity. We demonstrated that through functional connectivity with the ventral medial prefrontal cortex, the δ-network (regions of the executive control system, such as the dorsolateral prefrontal cortex) and the β-network (regions of the reward system involved in the mesocorticolimbic pathway), jointly influence impulsivity measured by the Barratt impulsiveness scale scores. In control nondrug-using subjects, the functional link between the β- and δ-networks is balanced, and the δ-network competitively controls impulsivity. However, in abstinent heroin-dependent subjects, the link is imbalanced, with stronger β-network connectivity and weaker δ-network connectivity. The imbalanced link is associated with impulsivity, indicating that the β- and δ-networks may mutually reinforce each other in abstinent heroin-dependent subjects. These findings of an aberrant link between the β- and δ-networks in abstinent heroin-dependent subjects may shed light on the mechanism of aberrant behaviors of drug addiction and may serve as an endophenotype to mark individual subjects' self-control capacity.

Details DOI

IJCAI Conference 2013 Conference Paper

Social Collaborative Filtering by Trust

Bo Yang
Yu Lei
Dayou Liu
Jiming Liu

To accurately and actively provide users with their potentially interested information or services is the main task of a recommender system. Collaborative filtering is one of the most widely adopted recommender algorithms, whereas it is suffering the issues of data sparsity and cold start that will severely degrade quality of recommendations. To address such issues, this article proposes a novel method, trying to improve the performance of collaborative filtering recommendation by means of elaborately integrating twofold sparse information, the conventional rating data given by users and the social trust network among the same users. It is a model-based method adopting matrix factorization technique to map users into low-dimensional latent feature spaces in terms of their trust relationship, aiming to reflect users’ reciprocal influence on their own opinions more reasonably. The validations against a real-world dataset show that the proposed method performs much better than state-of-the-art recommendation algorithms for social collaborative filtering by trust.

PDF Details DOI

IJCAI Conference 2013 Conference Paper

Social Trust Prediction Using Rank-k Matrix Recovery

Jin Huang
Feiping Nie
Heng Huang
Yu Lei
Chris Ding

Trust prediction, which explores the unobserved relationships between online community users, is an emerging and important research topic in social network analysis and many web applications. Similar to other social-based recommender systems, trust relationships between users can be also modeled in the form of matrices. Recent study shows users generally establish friendship due to a few latent factors, it is therefore reasonable to assume the trust matrices are of low-rank. As a result, many recommendation system strategies can be applied here. In particular, trace norm minimization, which uses matrix’s trace norm to approximate its rank, is especially appealing. However, recent articles cast doubts on the validity of trace norm approximation. In this paper, instead of using trace norm minimization, we propose a new robust rank-k matrix completion method, which explicitly seeks a matrix with exact rank. Moreover, our method is robust to noise or corrupted observations. We optimize the new objective function in an alternative manner, based on a combination of ancillary variables and Augmented Lagrangian Multiplier (ALM) Method. We perform the experiments on three real-world data sets and all empirical results demonstrate the effectiveness of our method.

PDF Details DOI