Author name cluster

Hao Lin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers

2 author rows

AAAI Conference 2026 Conference Paper

Driving with Advice: Large Model as Motion Advisor for Joint Planning

Junyin Wang
Jinlei Yu
Hao Lin
Huikai Liu
Wenqian Zhu
Shengwu Xiong

We address the challenge of integrating high-level semantic reasoning with low-level trajectory planning in end-to-end autonomous driving, where most existing frameworks decouple perception, decision-making, and control, leading to limited interpretability and poor instruction compliance. To bridge this gap, we propose Driving with Advice, a novel closed-loop framework that treats a vision-language model (VLM) as a motion advisor to provide interpretable, language-mediated guidance for trajectory generation. Our approach introduces three key innovations: (1) Semantic-Intentional Pretraining (SIP), which injects driving rationale into a compact VLM via machine-generated question-answering pairs; (2) a discrete action space grounded in directional and speed primitives, enabling structured and interpretable policy learning; and (3) an advice-following diffusion policy refined via Group Relative Policy Optimization under a multi-objective reward that ensures safety, comfort, and alignment with semantic intent. We evaluate our method on the NAVSIM benchmark in a closed-loop setting, achieving a state-of-the-art Predictive Driver Model Score (PDMS) of 91.5, outperforming strong baselines in safety (NC: 99.2). The results demonstrate that leveraging language as a cognitive interface between perception and control enhances both generalization and behavioral transparency, advancing the paradigm of language-conditioned driving.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Not Just What’s There: Enabling CLIP to Comprehend Negated Visual Descriptions Without Fine-Tuning

Junhao Xiao
Zhiyu Wu
Hao Lin
Yi Chen
Yahui Liu
Xiaoran Zhao
Zixu Wang
Zejiang He

Vision-Language Models (VLMs) like CLIP struggle to understand negation, often embedding affirmatives and negatives similarly (e.g., matching "no dog" with dog images). Existing methods refine negation understanding via fine-tuning CLIP’s text encoder, risking overfitting. In this work, we propose CLIPGlasses, a plug-and-play framework that enhances CLIP’s ability to comprehend negated visual descriptions. CLIPGlasses adapts a dual-stage design: a Lens module disentangles negated semantics from text embeddings, and a Frame module predicts context-aware repulsion strength, which is integrated into the modified similarity computation to penalize alignment with negated semantics, thereby reducing false positive matches. Experiments show that CLIP equipped with CLIPGlasses achieves competitive in-domain performance and outperforms state-of-the-art methods in cross-domain generalization. Its superiority is especially evident under low-resource conditions, indicating stronger robustness across domains.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Bridging Traffic State and Trajectory for Dynamic Road Network and Trajectory Representation Learning

Chengkai Han
Jingyuan Wang
Yongyao Wang
Xie Yu
Hao Lin
Chao Li
Junjie Wu

Effective urban traffic management is vital for sustainable city development, relying on intelligent systems with machine learning tasks such as traffic flow prediction and travel time estimation. Traditional approaches usually focus on static road network and trajectory representation learning, and overlook the dynamic nature of traffic states and trajectories, which is crucial for downstream tasks. To address this gap, we propose TRACK, a novel framework to bridge traffic state and trajectory data for dynamic road network and trajectory representation learning. TRACK leverages graph attention networks (GAT) to encode static and spatial road segment features, and introduces a transformer-based model for trajectory representation learning. By incorporating transition probabilities from trajectory data into GAT attention weights, TRACK captures dynamic spatial features of road segments. Meanwhile, TRACK designs a traffic transformer encoder to capture the spatial-temporal dynamics of road segments from traffic state data. To further enhance dynamic representations, TRACK proposes a co-attentional transformer encoder and a trajectory-traffic state matching task. Extensive experiments on real-life urban traffic datasets demonstrate the superiority of TRACK over state-of-the-art baselines. Case studies confirm TRACK’s ability to capture spatial-temporal dynamics effectively.

PDF Details DOI

ICML Conference 2025 Conference Paper

RollingQ: Reviving the Cooperation Dynamics in Multimodal Transformer

Haotian Ni
Yake Wei
Hang Liu
Gong Chen
Chong Peng
Hao Lin
Di Hu 0001

Multimodal learning faces challenges in effectively fusing information from diverse modalities, especially when modality quality varies across samples. Dynamic fusion strategies, such as attention mechanism in Transformers, aim to address such challenge by adaptively emphasizing modalities based on the characteristics of input data. However, through amounts of carefully designed experiments, we surprisingly observed that the dynamic adaptability of widely-used self-attention models diminishes. Model tends to prefer one modality regardless of data characteristics. This bias triggers a self-reinforcing cycle that progressively overemphasizes the favored modality, widening the distribution gap in attention keys across modalities and deactivating attention mechanism’s dynamic properties. To revive adaptability, we propose a simple yet effective method Rolling Query (RollingQ), which balances attention allocation by rotating the query to break the self-reinforcing cycle and mitigate the key distribution gap. Extensive experiments on various multimodal scenarios validate the effectiveness of RollingQ and the restoration of cooperation dynamics is pivotal for enhancing the broader capabilities of widely deployed multimodal Transformers. The source code is available at https: //github. com/GeWu-Lab/RollingQ_ICML2025.

Details

TCS Journal 2024 Journal Article

Hardness of Entropic Module-LWE

Hao Lin
Mingqiang Wang
Jincheng Zhuang
Yang Wang

Details DOI

ICML Conference 2023 Conference Paper

Surrogate Module Learning: Reduce the Gradient Error Accumulation in Training Spiking Neural Networks

Shikuang Deng
Hao Lin
Yuhang Li 0001
Shi Gu

Spiking neural networks provide an alternative solution to conventional artificial neural networks with energy-saving and high-efficiency characteristics after hardware implantation. However, due to its non-differentiable activation function and the temporally delayed accumulation in outputs, the direct training of SNNs is extraordinarily tough even adopting a surrogate gradient to mimic the backpropagation. For SNN training, this non-differentiability causes the intrinsic gradient error that would be magnified through layerwise backpropagation, especially through multiple layers. In this paper, we propose a novel approach to reducing gradient error from a new perspective called surrogate module learning (SML). Surrogate module learning tries to construct a shortcut path to back-propagate more accurate gradient to a certain SNN part utilizing the surrogate modules. Then, we develop a new loss function for concurrently training the network and enhancing the surrogate modules’ surrogate capacity. We demonstrate that when the outputs of surrogate modules are close to the SNN output, the fraction of the gradient error drops significantly. Our method consistently and significantly enhances the performance of SNNs on all experiment datasets, including CIFAR-10/100, ImageNet, and ES-ImageNet. For example, for spiking ResNet-34 architecture on ImageNet, we increased the SNN accuracy by 3. 46%.

Details

AAAI Conference 2017 Conference Paper

Collaborative Company Profiling: Insights from an Employee’s Perspective

Hao Lin
Hengshu Zhu
Yuan Zuo
Chen Zhu
Junjie Wu
Hui Xiong

Company proﬁling is an analytical process to build an indepth understanding of company’s fundamental characteristics. It serves as an effective way to gain vital information of the target company and acquire business intelligence. Traditional approaches for company proﬁling rely heavily on the availability of rich ﬁnance information about the company, such as ﬁnance reports and SEC ﬁlings, which may not be readily available for many private companies. However, the rapid prevalence of online employment services enables a new paradigm — to obtain the variety of company’s information from their employees’ online ratings and comments. This, in turn, raises the challenge to develop company pro- ﬁles from an employee’s perspective. To this end, in this paper, we propose a method named Company Proﬁling based Collaborative Topic Regression (CPCTR), for learning the latent structural patterns of companies. By formulating a joint optimization framework, CPCTR has the ability in collaboratively modeling both textual (e. g. , reviews) and numerical information (e. g. , salaries and ratings). Indeed, with the identi- ﬁed patterns, including the positive/negative opinions and the latent variable that inﬂuences salary, we can effectively carry out opinion analysis and salary prediction. Extensive experiments were conducted on a real-world data set to validate the effectiveness of CPCTR. The results show that our method provides a comprehensive understanding of company characteristics and delivers a more effective prediction of salaries than other baselines.

PDF Details

AIIM Journal 2017 Journal Article

Identify and analysis crotonylation sites in histone by using support vector machines

Wang-Ren Qiu
Bi-Qian Sun
Hua Tang
Jian Huang
Hao Lin

Details DOI

IS Journal 2015 Journal Article

Noninvasive and Continuous Blood Pressure Monitoring Using Wearable Body Sensor Networks

Hao Lin
Wenyao Xu
Nan Guan
Dong Ji
Yangjie Wei
Wang Yi

Hypertension is a major health risk that influences the quality of life for many people. The importance of monitoring hypertension in a continuous and noninvasive manner increases as more people experience raised blood pressure (BP). The authors present a smartphone-centric body sensor network to measure the pulse transit time (PTT) in real time. Their robust method for calculating BP uses PPT information that considers the baroreflex mechanism, which reflects the relationship between BP and the heart rate. To evaluate the performance of their proposed method, they collected 300 groups of data from six subjects before and after exercise. Experimental results show that their proposed method can estimate BP values in real time with good precision.

Details DOI