Author name cluster

Jian Lu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

15 papers

1 author row

AAAI Conference 2026 Conference Paper

MoFu: Scale-Aware Modulation and Fourier Fusion for Multi-Subject Video Generation

Run Ling
Ke Cao
Jian Lu
Ao Ma
Haowei Liu
Runze He
Changwei Wang
Rongtao Xu

Multi-subject video generation aims to synthesize videos from textual prompts and multiple reference images, ensuring that each subject preserves natural scale and visual fidelity. However, current methods face two challenges: scale inconsistency, where variations in subject size lead to unnatural generation, and permutation sensitivity, where the order of reference inputs causes subject distortion. In this paper, we propose MoFu, a unified framework that tackles both challenges. For scale inconsistency, we introduce Scale-Aware Modulation (SMO), an LLM-guided module that extracts implicit scale cues from the prompt and modulates features to ensure consistent subject sizes. To address permutation sensitivity, we present a simple yet effective Fourier Fusion strategy that processes the frequency information of reference features via the Fast Fourier Transform to produce a unified representation. Besides, we design a Scale-Permutation Stability Loss to jointly encourage scale-consistent and permutation-invariant generation. To further evaluate these challenges, we establish a dedicated benchmark with controlled variations in subject scale and reference permutation. Extensive experiments demonstrate that MoFu significantly outperforms existing methods in preserving natural scale, subject fidelity, and overall visual quality.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Point-SRA: Self-Representation Alignment for 3D Representation Learning

Lintong Wei
Jian Lu
Haozhe Cheng
Jihua Zhu
Kaibing Zhang

Masked autoencoders (MAE) have become a dominant paradigm in 3D representation learning, setting new performance benchmarks across various downstream tasks. Existing methods with fixed mask ratios neglect multi-level representational correlations and intrinsic geometric structures, while relying on point-wise reconstruction assumptions that conflict with the diversity of point cloud. To address these issues, we propose a 3D representation learning method, termed Point-SRA, which aligns representations through self-distillation and probabilistic modeling. Specifically, we assign different masking ratios to the MAE to capture complementary geometric and semantic information, while the MeanFlow Transformer (MFT) leverages cross-modal conditional embeddings to enable diverse probabilistic reconstruction. Our analysis further reveals that representations at different time steps in MFT also exhibit complementarity. Therefore, a Dual Self-Representation Alignment mechanism is proposed at both the MAE and MFT levels. Finally, we design a Flow-Conditioned Fine-Tuning Architecture to fully exploit the point cloud distribution learned via MeanFlow. Point-SRA outperforms Point-MAE by 5.37% on ScanObjectNN. On intracranial aneurysm segmentation, it reaches 96.07% mean IoU for arteries and 86.87% for aneurysms. For 3D object detection, Point-SRA achieves 47.3% AP@50, surpassing MaskPoint by 5.12%.

PDF Details DOI

AAAI Conference 2026 Conference Paper

RAGAR: Retrieval Augmented Personalized Image Generation Guided by Recommendation

Run Ling
Wenji Wang
Yuting Liu
Guibing Guo
Haowei Liu
Jian Lu
Quanwei Zhang
Yexing Xu

Personalized image generation is crucial for improving the user experience, as it renders reference images into preferred ones according to user visual preferences. Although effective, existing methods face two main issues. First, existing methods treat all items in the user's historical sequence equally when extracting user preferences, overlooking the varying semantic similarities between historical items and the reference item. Disproportionately high weights for low-similarity items distort user visual preferences for the reference item. Second, existing methods heavily rely on consistency between generated and reference images to optimize generation, which leads to underfitting user preferences and hinders personalization. To address these issues, we propose Retrieval Augmented Personalized Image GenerAtion guided by Recommendation (RAGAR). Our approach uses a retrieval mechanism to assign different weights to historical items according to their similarities to the reference item, thereby extracting more refined users' visual preferences for the reference item. Then we introduce a novel rank task based on the multi-modal ranking model to optimize the personalization of the generated images instead of forcing depend on consistency. Extensive experiments and human evaluations on three real-world datasets demonstrate that RAGAR achieves significant improvements in both personalization and semantic metrics compared to five baselines.

PDF Details DOI

AAAI Conference 2026 Conference Paper

RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers

Ke Cao
Jing Wang
Ao Ma
Jiasong Feng
Xuanhua He
Run Ling
Haowei Liu
Jian Lu

The Diffusion Transformer plays a pivotal role in advancing text-to-image and text-to-video generation, owing primarily to its inherent scalability. However, existing controlled diffusion transformer methods incur significant parameter and computational overheads and suffer from inefficient resource allocation due to their failure to account for the varying relevance of control information across different transformer layers. To address this, we propose the Relevance-Guided Efficient Controllable Generation framework, RelaCtrl, enabling efficient and resource-optimized integration of control signals into the Diffusion Transformer. First, we evaluate the relevance of each layer in the Diffusion Transformer to the control information by assessing the ControlNet Relevance Score, which measures the impact of skipping each control layer on both the quality of generation and the control effectiveness during inference. Based on the strength of the relevance, we then tailor the positioning, parameter scale, and modeling capacity of the control layers to reduce unnecessary parameters and redundant computations. Additionally, to further improve efficiency, we replace the self-attention and FFN in the commonly used copy block with the carefully designed Two-Dimensional Shuffle Mixer (TDSM), enabling efficient implementation of both the token mixer and channel mixer. Both qualitative and quantitative experimental results demonstrate that our approach achieves superior performance with only 15% of the parameters and computational complexity compared to PixArt-delta.

PDF Details DOI

AAAI Conference 2025 Conference Paper

CustomTTT: Motion and Appearance Customized Video Generation via Test-Time Training

Xiuli Bi
Jian Lu
Bo Liu
Xiaodong Cun
Yong Zhang
Weisheng Li
Bin Xiao

Benefiting from large-scale pre-training of text-video pairs, current text-to-video (T2V) diffusion models can generate high-quality videos from the text description. Besides, given some reference images or videos, the parameter-efficient fine-tuning method, i.e. LoRA, can generate high-quality customized concepts, e.g., the specific subject or the motions from a reference video. However, combining the trained multiple concepts from different references into a single network shows obvious artifacts. To this end, we propose CustomTTT, where we can joint custom the appearance and the motion of the given video easily. In detail, we first analyze the prompt influence in the current video diffusion model and find the LoRAs are only needed for the specific layers for appearance and motion customization. Besides, since each LoRA is trained individually, we propose a novel test-time training technique to update parameters after combination utilizing the trained customized models. We conduct detailed experiments to verify the effectiveness of the proposed methods. Our method outperforms several state-of-the-art works in both qualitative and quantitative evaluations.

PDF Details DOI

EAAI Journal 2025 Journal Article

Deep Forest with SHapley additive explanations on detailed risky driving behavior data for freeway crash risk prediction

Xiaochi Ma
Zongxin Huo
Jian Lu
Yiik Diew Wong

Freeway crash risk prediction is a critical component of traffic safety management, yet existing crash risk models often fail to capture complex driving behaviors and lack interpretability. This study introduces a novel freeway crash risk prediction framework based on the Deep Forest (DF) algorithm, considering the detailed risky driving behavior data. The DF model integrates multi-grained scanning and cascade forest layers, enabling it to capture the complex relationship between risky driving behavior features. SHapley Additive Explanations (SHAP) are applied to interpret the model's predictions, including both SHAP summary and interaction results. Additionally, ablation studies are conducted to evaluate the contributions of key components like multi-grained scanning and cascade structures to the model's performance. The experimental results demonstrate that the DF model outperforms traditional machine learning models. The DF model achieves an area under the receiver operating characteristic curve of 0. 825, with a balanced Sensitivity of 0. 75 and Specificity of 0. 816, surpassing other models. The ablation studies show that removing multi-grained scanning, cascade layers, or completely random tree forest leads to performance declines, confirming the importance of each component. The SHAP analysis highlights that sharp acceleration and braking behaviors have the most significant impact on crash risk, offering clear, interpretable insights into how driving behaviors contribute to risk. Overall, the DF model's superior performance and SHAP-based interpretability provide a powerful tool for traffic safety management. These findings emphasize the value of incorporating both driving behavior intensity and model interpretability into crash risk prediction, offering practical applications for reducing crash rates.

Details DOI

EAAI Journal 2025 Journal Article

Multi-step control method of traffic flow data quality based on spatiotemporal similarity at video frame rate

Yue Chen
Jian Lu

Traffic flow data is a quantitative description of traffic operation status. It lays a data foundation for optimizing traffic management, improving travel efficiency and ensuring traffic safety. Traffic flow data quality is crucial to the construction and operation of intelligent transportation system. At present, traffic flow data mostly come from vehicle detectors, but its long sampling period and sparse sampling points affect the quality control effect of traffic flow data. To solve these problems, we propose a multi-step automatic control method of traffic flow data quality based on spatiotemporal similarity. In this framework, we divide the data quality control process into preliminary repair and combinatorial optimization by combining the spatiotemporal advantages of the cross-sectional traffic flow data at video frame rate. Firstly, according to the different characteristics of the missing data, the preliminary repair is carried out automatically. For the missing data at the section level, the combined repair method based on the preprocessing of continuous missing data and the segmented low-order interpolation is proposed. For the missing data at road network level, the adaptive weight exponential smoothing method based on the time similarity is proposed. Then, a multi-sectional optimization model of based on spatiotemporal similarity was constructed to further optimize the preliminary repair results. The experimental results show that the preliminary repair method proposed in this paper is superior to other baseline models, and the combined optimization model based on the preliminary repair data greatly improves the data repair effect, which is suitable for different missing rates and types, and has certain competitiveness in traffic flow data quality control.

Details DOI

JBHI Journal 2023 Journal Article

Development of Prognostic Biomarkers by TMB-Guided WSI Analysis: A Two-Step Approach

Xiangyu Liu
Zhenyu Liu
Ye Yan
Kai Wang
Aodi Wang
Xiongjun Ye
Liwei Wang
Wei Wei

The rapid development of computational pathology has brought new opportunities for prognosis prediction using histopathological images. However, the existing deep learning frameworks lack exploration of the relationship between images and other prognostic information, resulting in poor interpretability. Tumor mutation burden (TMB) is a promising biomarker for predicting the survival outcomes of cancer patients, but its measurement is costly. Its heterogeneity may be reflected in histopathological images. Here, we report a two-step framework for prognostic prediction using whole-slide images (WSIs). First, the framework adopts a deep residual network to encode the phenotype of WSIs and classifies patient-level TMB by the deep features after aggregation and dimensionality reduction. Then, the patients' prognosis is stratified by the TMB-related information obtained during the classification model development. Deep learning feature extraction and TMB classification model construction are performed on an in-house dataset of 295 Haematoxylin & Eosin stained WSIs of clear cell renal cell carcinoma (ccRCC). The development and evaluation of prognostic biomarkers are performed on The Cancer Genome Atlas-Kidney ccRCC (TCGA-KIRC) project with 304 WSIs. Our framework achieves good performance for TMB classification with an area under the receiver operating characteristic curve (AUC) of 0. 813 on the validation set. Through survival analysis, our proposed prognostic biomarkers can achieve significant stratification of patients' overall survival (P $< $ 0. 05) and outperform the original TMB signature in risk stratification of patients with advanced disease. The results indicate the feasibility of mining TMB-related information from WSI to achieve stepwise prognosis prediction.

Details DOI

NeurIPS Conference 2023 Conference Paper

Neuro-symbolic Learning Yielding Logical Constraints

Zenan Li
Yunpeng Huang
Zhaoyu Li
Yuan Yao
Jingwei Xu
Taolue Chen
Xiaoxing Ma
Jian Lu

Neuro-symbolic systems combine the abilities of neural perception and logical reasoning. However, end-to-end learning of neuro-symbolic systems is still an unsolved challenge. This paper proposes a natural framework that fuses neural network training, symbol grounding, and logical constraint synthesis into a coherent and efficient end-to-end learning process. The capability of this framework comes from the improved interactions between the neural and the symbolic parts of the system in both the training and inference stages. Technically, to bridge the gap between the continuous neural network and the discrete logical constraint, we introduce a difference-of-convex programming technique to relax the logical constraints while maintaining their precision. We also employ cardinality constraints as the language for logical constraint learning and incorporate a trust region method to avoid the degeneracy of logical constraint in learning. Both theoretical analyses and empirical evaluations substantiate the effectiveness of the proposed framework.

PDF Details

JBHI Journal 2023 Journal Article

Semi-Supervised Medical Image Segmentation With Voxel Stability and Reliability Constraints

Yang Zhao
Ke Lu
Jian Xue
Shuhua Wang
Jian Lu

Semi-supervised learning is becoming an effective solution in medical image segmentation because annotations are costly and tedious to acquire. Methods based on the teacher-student model use consistency regularization and uncertainty estimation and have shown good potential in dealing with limited annotated data. Nevertheless, the existing teacher-student model is seriously limited by the exponential moving average algorithm, which leads to the optimization trap. Moreover, the classic uncertainty estimation method calculates the global uncertainty for images but does not consider local region-level uncertainty, which is unsuitable for medical images with blurry regions. In this article, the Voxel Stability and Reliability Constraint (VSRC) model is proposed to address these issues. Specifically, the Voxel Stability Constraint (VSC) strategy is introduced to optimize parameters and exchange effective knowledge between two independent initialized models, which can break through the performance bottleneck and avoid model collapse. Moreover, a new uncertainty estimation strategy, the Voxel Reliability Constraint (VRC), is proposed for use in our semi-supervised model to consider the uncertainty at the local region level. We further extend our model to auxiliary tasks and propose a task-level consistency regularization with uncertainty estimation. Extensive experiments on two 3D medical image datasets demonstrate that our method outperforms other state-of-the-art semi-supervised medical image segmentation methods under limited supervision.

Details DOI

EAAI Journal 2021 Journal Article

PTANet: Triple Attention Network for point cloud semantic segmentation

Haozhe Cheng
Jian Lu
Maoxin Luo
Wei Liu
Kaibing Zhang

For 3D point cloud semantic segmentation, mining more informative features to enrich contextual representation is regarded as the key to achieve better segmentation performance. Unfortunately, the existing point cloud segmentation network lacks a comprehensive consideration of utilizing contextual information from both global and local perspectives, thus failing to fully explore the contextual representation, which prevents fine-grained objects from being accurately recognized. Therefore, this paper proposes a neural network dubbed PTANet that effectively enriches the contextual representation to improve segmentation accuracy. PTANet possesses two uncomplicated and effective parts: Triple Attention Block and Density Scale Learning Strategy. Triple Attention Block consists of three sub modules: 1. Position attention module updates feature maps by modeling the interdependency between the spatial positions of each point. 2. Channel attention module recalibrates the original feature in the light of the correlation weight between the channels of feature maps to enrich the contextual representation globally. 3. Local Region attention module calculates the interdependence weight between local neighbors to further complement the local feature information. In addition, to alleviate the adverse effect of non-uniform distribution of point cloud on the inference results, Density Scale Learning Strategy applies the kernel density estimation under the adaptive bandwidth to fit the density scale of each point. In particular, the density scale weighted to the feature maps can also supplement the density information for local features. The experimental performance verifies the effectiveness of PTANet. It obtained 86. 1% mIoU on ShapeNet, 62. 4% mIoU on ScannetV2, and 87. 9% OA on S3DIS.

Details DOI

AAAI Conference 2019 Conference Paper

Recommending suitable tags for online textual content is a key building block for better content organization and consumption. In this paper, we identify three pillars that impact the accuracy of tag recommendation: (1) sequential text modeling meaning that the intrinsic sequential ordering as well as different areas of text might have an important implication on the corresponding tag(s), (2) tag correlation meaning that the tags for a certain piece of textual content are often semantically correlated with each other, and (3) content-tag overlapping meaning that the vocabularies of content and tags are overlapped. However, none of the existing methods consider all these three aspects, leading to a suboptimal tag recommendation. In this paper, we propose an integral model to encode all the three aspects in a coherent encoder-decoder framework. In particular, (1) the encoder models the semantics of the textual content via Recurrent Neural Networks with the attention mechanism, (2) the decoder tackles the tag correlation with a prediction path, and (3) a shared embedding layer and an indicator function across encoder-decoder address the content-tag overlapping. Experimental results on three realworld datasets demonstrate that the proposed method significantly outperforms the existing methods in terms of recommendation accuracy.

PDF Details

IJCAI Conference 2019 Conference Paper

Commit Message Generation for Source Code Changes

Shengbin Xu
Yuan Yao
Feng Xu
Tianxiao Gu
Hanghang Tong
Jian Lu

Commit messages, which summarize the source code changes in natural language, are essential for program comprehension and software evolution understanding. Unfortunately, due to the lack of direct motivation, commit messages are sometimes neglected by developers, making it necessary to automatically generate such messages. State-of-the-art adopts learning based approaches such as neural machine translation models for the commit message generation problem. However, they tend to ignore the code structure information and suffer from the out-of-vocabulary issue. In this paper, we propose CoDiSum to address the above two limitations. In particular, we first extract both code structure and code semantics from the source code changes, and then jointly model these two sources of information so as to better learn the representations of the code changes. Moreover, we augment the model with copying mechanism to further mitigate the out-of-vocabulary issue. Experimental evaluations on real data demonstrate that the proposed approach significantly outperforms the state-of-the-art in terms of accurately generating the commit messages.

PDF Details

AAAI Conference 2019 Conference Paper

Hashtag Recommendation for Photo Sharing Services

Suwei Zhang
Yuan Yao
Feng Xu
Hanghang Tong
Xiaohui Yan
Jian Lu

Hashtags can greatly facilitate content navigation and improve user engagement in social media. Meaningful as it might be, recommending hashtags for photo sharing services such as Instagram and Pinterest remains a daunting task due to the following two reasons. On the endogenous side, posts in photo sharing services often contain both images and text, which are likely to be correlated with each other. Therefore, it is crucial to coherently model both image and text as well as the interaction between them. On the exogenous side, hashtags are generated by users and different users might come up with different tags for similar posts, due to their different preference and/or community effect. Therefore, it is highly desirable to characterize the users’ tagging habits. In this paper, we propose an integral and effective hashtag recommendation approach for photo sharing services. In particular, the proposed approach considers both the endogenous and exogenous effects by a content modeling module and a habit modeling module, respectively. For the content modeling module, we adopt the parallel co-attention mechanism to coherently model both image and text as well as the interaction between them; for the habit modeling module, we introduce an external memory unit to characterize the historical tagging habit of each user. The overall hashtag recommendations are generated on the basis of both the post features from the content modeling module and the habit influences from the habit modeling module. We evaluate the proposed approach on real Instagram data. The experimental results demonstrate that the proposed approach significantly outperforms the state-of-theart methods in terms of recommendation accuracy, and that both content modeling and habit modeling contribute significantly to the overall recommendation accuracy.

PDF Details

IJCAI Conference 2015 Conference Paper

Ice-Breaking: Mitigating Cold-Start Recommendation Problem by Rating Comparison

Jingwei Xu
Yuan Yao
Hanghang Tong
Xianping Tao
Jian Lu

Recommender system has become an indispensable component in many e-commerce sites. One major challenge that largely remains open is the coldstart problem, which can be viewed as an ice barrier that keeps the cold-start users/items from the warm ones. In this paper, we propose a novel rating comparison strategy (RAPARE) to break this ice barrier. The center-piece of our RAPARE is to provide a fine-grained calibration on the latent profiles of cold-start users/items by exploring the differences between cold-start and warm users/items. We instantiate our RAPARE strategy on the prevalent method in recommender system, i. e. , the matrix factorization based collaborative filtering. Experimental evaluations on two real data sets validate the superiority of our approach over the existing methods in cold-start scenarios.

PDF Details