Arrow Research search

Author name cluster

Man Luo

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
2 author rows

Possible papers

11

IJCAI Conference 2025 Conference Paper

BGM: Demand Prediction for Expanding Bike-Sharing Systems with Dynamic Graph Modeling

  • Yixuan Zhao
  • Hongkai Wen
  • Xingchen Zhang
  • Man Luo

Accurate demand prediction is crucial for the equitable and sustainable expansion of bike-sharing systems, which help reduce urban congestion, promote low-carbon mobility, and improve transportation access in underserved areas. However, expanding these systems presents societal challenges, particularly in ensuring fair resource distribution and operational efficiency. A major hurdle is the difficulty of demand prediction at new stations, which lack historical usage data and are heavily influenced by the existing network. Additionally, new stations dynamically reshape demand patterns across time and space, complicating efforts to balance supply and accessibility in evolving urban environments. Existing methods model relationships between new and existing stations but often assume static patterns, overlooking how new stations reshape demand dynamics over time and space. To tackle these challenges, we propose a novel demand prediction framework for expanding bike-sharing systems, namely BGM, which leverages dynamic graph modeling to capture the evolving inter-station correlations while accounting for spatial and temporal heterogeneity. Specifically, we develop a knowledge transfer approach that studies the embeddings transformation across existing and new stations through a learnable orthogonal mapping matrix. We further design a gated selecting vector-based feature fusion mechanism to integrate the transferred embeddings and the intrinsic features of stations for precise predictions. Experiments on real-world bike-sharing data demonstrate that BGM outperforms existing methods.

NeurIPS Conference 2025 Conference Paper

Evaluating the Inductive Abilities of Large Language Models: Why Chain-of-Thought Reasoning Sometimes Hurts More Than Helps

  • Haibo Jin
  • Peiyan Zhang
  • Man Luo
  • Haohan Wang

Large Language Models (LLMs) have shown remarkable progress across domains, yet their ability to perform inductive reasoning—inferring latent rules from sparse examples—remains limited. It is often assumed that chain-of-thought (CoT) prompting, as used in Large Reasoning Models (LRMs), enhances such reasoning. We investigate this assumption with creating four controlled, diagnostic game-based tasks—chess, Texas Hold’em, dice games, and blackjack—with hidden human-defined rules. We find that CoT reasoning can degrade inductive performance, with LRMs often underperforming their non-reasoning counterparts. To explain this, we present a theoretical framework that reveals how reasoning steps can amplify error through three failure modes: incorrect sub-task decomposition, incorrect sub-task solving, and incorrect final answer summarization. Based on our theoretical and empirical analysis, we introduce structured interventions that adapt CoT generation according to our identified failure types. These interventions improve inductive accuracy without retraining. Our findings suggest that effective (CoT) reasoning depends not only on taking more steps but also on ensuring those steps are well-structured.

IJCAI Conference 2025 Conference Paper

High-Fidelity Road Network Generation with Latent Diffusion Models

  • Jinming Wang
  • Hongkai Wen
  • Geyong Min
  • Man Luo

Road networks are the vein of modern cities. Yet, maintaining up-to-date and accurate road network information is a persistent challenge, especially in areas with rapid urban changes or limited surveying resources. Crowdsourced trajectories, e. g. , from GPS records collected by mobile devices and vehicles, have emerged as a powerful data source for continuously mapping the urban areas. However, the inherent noise, irregular and often sparse sampling rates, and the vast variability in movement patterns make the problem of road network generation from trajectories a non-trivial task. Existing methods often approach this from an appearance-based perspective: they typically render trajectories as 2D density maps and then employ heuristic algorithms to extract road networks - leading to inevitable information loss and thus poor performance especially when trajectories are sparse or ambiguities present, e. g. flyovers. In this paper, we propose a novel approach, called GraphWalker, to generate high-fidelity road network graphs from raw trajectories in an end-to-end manner. We achieve this by designing a bespoke latent diffusion transformer T2W-DiT, which treats input trajectories as generation conditions, and gradually denoises samples from a latent space to obtain the corresponding walks on the underlying road network graph - then assemble them together as the final road network. Extensive experiments on multiple datasets demonstrate the proposed GraphWalker can effectively generate high quality road networks from noisy and sparse trajectories, showcasing significant improvements over state-of-the-art.

ICML Conference 2025 Conference Paper

Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization

  • Peiyan Zhang
  • Haibo Jin
  • Leyang Hu
  • Xinnuo Li
  • Liying Kang
  • Man Luo
  • Yangqiu Song
  • Haohan Wang

Recent advancements in large language models (LLMs) have significantly enhanced the ability of LLM-based systems to perform complex tasks through natural language processing and tool interaction. However, optimizing these LLM-based systems for specific tasks remains challenging, often requiring manual interventions like prompt engineering and hyperparameter tuning. Existing automatic optimization methods, such as textual feedback-based techniques ( e. g. , TextGrad), tend to focus on immediate feedback, analogous to using immediate derivatives in traditional numerical gradient descent. However, relying solely on such feedback can be limited when the adjustments made in response to this feedback are either too small or fluctuate irregularly, potentially slowing down or even stalling the optimization process. In this paper, we introduce $\textbf{REVOLVE}$, an optimization method that tracks how $\textbf{R}$esponses $\textbf{EVOLVE}$ across iterations in LLM systems. By focusing on the evolution of responses over time, REVOLVE enables more stable and effective optimization by making thoughtful, progressive adjustments at each step. Experiments across three tasks demonstrate the adaptability and efficiency of our proposal. Beyond its practical contributions, REVOLVE highlights a promising direction, where the rich knowledge from established optimization principles can be leveraged to enhance LLM systems, which paves the way for further advancements in this hybrid domain. Code is available at: https: //llm-revolve. netlify. app.

ICML Conference 2025 Conference Paper

SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs

  • Xin Su 0008
  • Man Luo
  • Kris W. Pan
  • Tien Pei Chou
  • Vasudev Lal
  • Phillip Howard

Multimodal retrieval-augmented generation (RAG) plays a crucial role in domains such as knowledge-based visual question answering (KB-VQA), where models should effectively integrate additional knowledge to generate a response. However, existing vision and language models (VLMs) are not inherently designed for context-augmented generation, limiting their effectiveness in such tasks. While synthetic data generation has recently gained attention for training large VLMs, its application for context-augmented generation remains underexplored. To address this gap, we introduce SKVQA, a large-scale synthetic multimodal dataset containing over 2 million visual question-answer pairs, each associated with external knowledge sources to determine the final answer. Compared to previous datasets, SKVQA exhibits 11$\times$ more unique questions, greater domain diversity, and a broader spectrum of image sources. Through human evaluations, we confirm the high quality of the generated question-answer pairs and their contextual relevance. Extensive experiments show that SKVQA serves both as a challenging benchmark for knowledge-based VQA and as an effective training resource for adapting generative multimodal models to context-augmented generation. Our results further indicate that models trained on SKVQA demonstrate enhanced generalization in both context-aware VQA and multimodal RAG settings.

TMLR Journal 2024 Journal Article

In-context Learning with Retrieved Demonstrations for Language Models: A Survey

  • Man Luo
  • Xin Xu
  • Yue Liu
  • Panupong Pasupat
  • Mehran Kazemi

Large language models have demonstrated remarkable few-shot in-context learning (ICL) capabilities, adapting to new tasks with few-shots demonstrations. However, the efficacy of ICL is highly dependent on the selection of these demonstrations. Recent developments have introduced retrieval-based in-context learning (RetICL), which dynamically retrieves demonstrations tailored to each input query. This approach leverages existing databases and retrieval systems, enhancing efficiency and scalability while mitigating biases inherent in manual example selection. Given the promising results and growing interest in RetICL, we present a comprehensive survey of this field. Our review encompasses: design choices for ICL demonstration retrieval models, retrieval training procedures, inference strategies and current applications of RetICL. In the end, we explore future directions for this emerging technology.

AAAI Conference 2022 Conference Paper

Improving Biomedical Information Retrieval with Neural Retrievers

  • Man Luo
  • Arindam Mitra
  • Tejas Gokhale
  • Chitta Baral

Information retrieval (IR) is essential in search engines and dialogue systems as well as natural language processing tasks such as open-domain question answering. IR serve an important function in the biomedical domain, where content and sources of scientific knowledge may evolve rapidly. Although neural retrievers have surpassed traditional IR approaches such as TF-IDF and BM25 in standard open-domain question answering tasks, they are still found lacking in the biomedical domain. In this paper, we seek to improve information retrieval (IR) using neural retrievers (NR) in the biomedical domain, and achieve this goal using a three-pronged approach. First, to tackle the relative lack of data in the biomedical domain, we propose a template-based question generation method that can be leveraged to train neural retriever models. Second, we develop two novel pre-training tasks that are closely aligned to the downstream task of information retrieval. Third, we introduce the “Poly-DPR” model which encodes each context into multiple context vectors. Extensive experiments and analysis on the BioASQ challenge suggest that our proposed method leads to large gains over existing neural approaches and beats BM25 in the small-corpus setting. We show that BM25 and our method can complement each other, and a simple hybrid model leads to further gains in the large corpus setting.

AAAI Conference 2022 Conference Paper

Privacy-Preserving Face Recognition in the Frequency Domain

  • Yinggui Wang
  • Jian Liu
  • Man Luo
  • Le Yang
  • Li Wang

Some applications require performing face recognition (FR) on third-party servers, which could be accessed by attackers with malicious intents to compromise the privacy of users’ face information. This paper advocates a practical privacypreserving frequency-domain FR scheme without key management. The new scheme first collects the components with the same frequency from different blocks of a face image to form component channels. Only part of the channels are retained and fed into the analysis network that performs an interpretable privacy-accuracy trade-off analysis to identify channels important for face image visualization but not crucial for maintaining high FR accuracy. For this purpose, the loss function of the analysis network consists of the empirical FR error loss and a face visualization penalty term, and the network is trained in an end-to-end manner. We find that with the developed analysis network, more than 94% of the image energy can be dropped while the face recognition accuracy stays almost undegraded. In order to further protect the remaining frequency components, we propose a fast masking method. Effectiveness of the new scheme in removing the visual information of face images while maintaining their distinguishability is validated over several large face datasets. Results show that the proposed scheme achieves a recognition performance and inference time comparable to ArcFace operating on original face images directly.

IJCAI Conference 2020 Conference Paper

Rebalancing Expanding EV Sharing Systems with Deep Reinforcement Learning

  • Man Luo
  • Wenzhe Zhang
  • Tianyou Song
  • Kun Li
  • Hongming Zhu
  • Bowen Du
  • Hongkai Wen

Electric Vehicle (EV) sharing systems have recently experienced unprecedented growth across the world. One of the key challenges in their operation is vehicle rebalancing, i. e. , repositioning the EVs across stations to better satisfy future user demand. This is particularly challenging in the shared EV context, because i) the range of EVs is limited while charging time is substantial, which constrains the rebalancing options; and ii) as a new mobility trend, most of the current EV sharing systems are still continuously expanding their station networks, i. e. , the targets for rebalancing can change over time. To tackle these challenges, in this paper we model the rebalancing task as a Multi-Agent Reinforcement Learning (MARL) problem, which directly takes the range and charging properties of the EVs into account. We propose a novel approach of policy optimization with action cascading, which isolates the non-stationarity locally, and use two connected networks to solve the formulated MARL. We evaluate the proposed approach using a simulator calibrated with 1-year operation data from a real EV sharing system. Results show that our approach significantly outperforms the state-of-the-art, offering up to 14% gain in order satisfied rate and 12% increase in net revenue.