Author name cluster

Lijuan Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

44 papers

2 author rows

AAAI Conference 2026 Conference Paper

Towards Zero-Shot Diabetic Retinopathy Grading: Learning Generalized Knowledge via Prompt-Driven Matching and Emulating

Huan Wang
Haoran Li
Yuxin Lin
Huaming Chen
Jun Yan
Lijuan Wang
Jiahua Shi
Qihao Xu

As one of the primary causes of visual impairment, Diabetic Retinopathy (DR) requires accurate and robust grading to facilitate timely diagnosis and intervention. Different from conventional DR grading methods that utilize single-view images, recent clinical studies have revealed that multi-view fundus images can significantly enhance DR grading performance by expanding the field of view (FOV). However, there is a long-tailed distribution problem in fundus image analysis, i.e., a high prevalence of mild DR grades and a low prevalence of rare ones (e.g., cases of high severity), which presents a significant challenge to developing a unified model capable of detecting rare or unseen DR grades not encountered during training. In this paper, we propose ProME-DR, a Prompt-driven zero-shot DR grading framework, which leverages prompt Matching and Emulating to recognize the unseen DR categories and views beyond the training set. ProME-DR disentangles the training process into two stages to learn generalized knowledge for novel DR disease grading. Initially, ProME-DR leverages two sets of prompt units to capture semantic and inter-view consistency knowledge via a split-and-mask manner, gathering instance-level DR visual clues. Subsequently, it constructs a concept-aware emulator to generate context prompt units, linking extensible knowledge learned from the previously seen DR attributes for zero-shot DR grading. Extensive experiments conducted on eight datasets and various scenarios confirm the superiority of ProME-DR.

PDF Details DOI

EAAI Journal 2025 Journal Article

An adaptive traffic signal control scheme with Proximal Policy Optimization based on deep reinforcement learning for a single intersection

Lijuan Wang
Guoshan Zhang
Qiaoli Yang
Tianyang Han

Adaptive traffic signal control (ATSC) is an important means to alleviate traffic congestion and improve the quality of road traffic. Although deep reinforcement learning (DRL) technology has shown great potential in solving traffic signal control problems, the state representation and reward design, as well as action interval time, still need to be further studied. The advantages of policy learning have not been fully applied in TSC. To address the aforementioned issues, we propose a DRL-based traffic signal control scheme with Poximal Policy Optimization (PPO-TSC). We use the waiting time of vehicles and the queue length of lanes represented the spatiotemporal characteristics of traffic flow to design the simplified traffic states feature vectors, and define the reward function that is consistent with the state. Additionally, we compare and analyze the performance indexes obtained by various methods using action intervals of 5s, 10s, and 15s. The algorithm is implemented based on the Actor-Critic architecture, using the advantage estimation and the clip mechanism to constrain the range of gradient updates. We validate the proposed scheme at a single intersection in Simulation of Urban MObility (SUMO) under two different traffic demand patterns of flat traffic and peak traffic. The experimental results show that the proposed method is significantly better than other compared methods. Specifically, PPO-TSC demonstrates a reduction of 24% in average travel time (ATT), a decrease of 45% in the average time loss (ATL), and an increase of 16% in average speed (AS) compared with the existing methods under peak traffic condition.

Details DOI

ICML Conference 2025 Conference Paper

Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark

Yunzhuo Hao
Jiawei Gu
Huichen Will Wang
Linjie Li
Zhengyuan Yang
Lijuan Wang
Yu Cheng 0001

The ability to organically reason over and with both text and images is a pillar of human intelligence, yet the ability of Multimodal Large Language Models (MLLMs) to perform such multimodal reasoning remains under-explored. Existing benchmarks often emphasize text-dominant reasoning or rely on shallow visual cues, failing to adequately assess integrated visual and textual reasoning. We introduce EMMA (Enhanced MultiModal reAsoning), a benchmark targeting organic multimodal reasoning across mathematics, physics, chemistry, and coding. EMMA tasks demand advanced cross-modal reasoning that cannot be addressed by reasoning independently in each modality, offering an enhanced test suite for MLLMs’ reasoning capabilities. Our evaluation of state-of-the-art MLLMs on EMMA reveals significant limitations in handling complex multimodal and multi-step reasoning tasks, even with advanced techniques like Chain-of-Thought prompting and test-time compute scaling underperforming. These findings underscore the need for improved multimodal architectures and training paradigms to close the gap between human and model reasoning in multimodality.

Details