Arrow Research search

Author name cluster

Luoyi Fu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
2 author rows

Possible papers

11

ICML Conference 2025 Conference Paper

Controlling Underestimation Bias in Constrained Reinforcement Learning for Safe Exploration

  • Shiqing Gao
  • Jiaxin Ding 0001
  • Luoyi Fu
  • Xinbing Wang

Constrained Reinforcement Learning (CRL) aims to maximize cumulative rewards while satisfying constraints. However, existing CRL algorithms often encounter significant constraint violations during training, limiting their applicability in safety-critical scenarios. In this paper, we identify the underestimation of the cost value function as a key factor contributing to these violations. To address this issue, we propose the Memory-driven Intrinsic Cost Estimation (MICE) method, which introduces intrinsic costs to mitigate underestimation and control bias to promote safer exploration. Inspired by flashbulb memory, where humans vividly recall dangerous experiences to avoid risks, MICE constructs a memory module that stores previously explored unsafe states to identify high-cost regions. The intrinsic cost is formulated as the pseudo-count of the current state visiting these risk regions. Furthermore, we propose an extrinsic-intrinsic cost value function that incorporates intrinsic costs and adopts a bias correction strategy. Using this function, we formulate an optimization objective within the trust region, along with corresponding optimization methods. Theoretically, we provide convergence guarantees for the proposed cost value function and establish the worst-case constraint violation for the MICE update. Extensive experiments demonstrate that MICE significantly reduces constraint violations while preserving policy performance comparable to baselines.

TIST Journal 2025 Journal Article

Cost-aware Best Arm Identification in Stochastic Bandits

  • Zhida Qin
  • Wenhao Xue
  • Lu Zheng
  • Xiaoying Gan
  • Hongqiu Wu
  • Haiming Jin
  • Luoyi Fu

The best arm identification problem in multi-armed bandit model has been widely applied into many practical applications, such as spectrum sensing, online advertising, and cloud computing. Although lots of works have been devoted into this area, most of them do not consider the cost of pulling actions, i.e., a player has to pay some cost when she pulls an arm. Motivated by this, we study a ratio-based best arm identification problem, where each arm is associated with a random reward as well as a random cost. For any \(\delta\in(0,1)\), with probability at least \(1-\delta\), the player aims to find the arm with the largest ratio of expected reward to expected cost using as few samplings as possible. Specifically, we consider two settings: (1) the precise setting, i.e., identifying the precise optimal one; (2) the Probably Approximate Correct (PAC) setting, which identifies the \(\epsilon\) -optimal one. For the precise setting, we design the elimination-type algorithms and provide a fundamental lower bound which asymptotically matches the upper bound, while in the PAC setting, an UCB-type algorithm which amed \(\epsilon\) -RCB algorithm is proposed. We show that for all algorithms, the sample complexities, i.e., the pulling times for all arms, grow logarithmically as \(\frac{1}{\delta}\) increases. Moreover, compared to existing works, the running of our algorithms is independent of the arm-related parameters, which is more practical. Finally, we validate our theoretical results through numerical experiments.

ICML Conference 2025 Conference Paper

Extreme Value Policy Optimization for Safe Reinforcement Learning

  • Shiqing Gao
  • Yihang Zhou
  • Shuai Shao
  • Haoyu Luo
  • Yiheng Bing
  • Jiaxin Ding 0001
  • Luoyi Fu
  • Xinbing Wang

Ensuring safety is a critical challenge in applying Reinforcement Learning (RL) to real-world scenarios. Constrained Reinforcement Learning (CRL) addresses this by maximizing returns under predefined constraints, typically formulated as the expected cumulative cost. However, expectation-based constraints overlook rare but high-impact extreme value events in the tail distribution, such as black swan incidents, which can lead to severe constraint violations. To address this issue, we propose the Extreme Value policy Optimization (EVO) algorithm, leveraging Extreme Value Theory (EVT) to model and exploit extreme reward and cost samples, reducing constraint violations. EVO introduces an extreme quantile optimization objective to explicitly capture extreme samples in the cost tail distribution. Additionally, we propose an extreme prioritization mechanism during replay, amplifying the learning signal from rare but high-impact extreme samples. Theoretically, we establish upper bounds on expected constraint violations during policy updates, guaranteeing strict constraint satisfaction at a zero-violation quantile level. Further, we demonstrate that EVO achieves a lower probability of constraint violations than expectation-based methods and exhibits lower variance than quantile regression methods. Extensive experiments show that EVO significantly reduces constraint violations during training while maintaining competitive policy performance compared to baselines.

IJCAI Conference 2024 Conference Paper

Exterior Penalty Policy Optimization with Penalty Metric Network under Constraints

  • Shiqing Gao
  • Jiaxin Ding
  • Luoyi Fu
  • Xinbing Wang
  • Chenghu Zhou

In Constrained Reinforcement Learning (CRL), agents explore the environment to learn the optimal policy while satisfying constraints. The penalty function method has recently been studied as an effective approach for handling constraints, which imposes constraints penalties on the objective to transform the constrained problem into an unconstrained one. However, it is challenging to choose appropriate penalties that balance policy performance and constraint satisfaction efficiently. In this paper, we propose a theoretically guaranteed penalty function method, Exterior Penalty Policy Optimization (EPO), with adaptive penalties generated by a Penalty Metric Network (PMN). PMN responds appropriately to varying degrees of constraint violations, enabling efficient constraint satisfaction and safe exploration. We theoretically prove that EPO consistently improves constraint satisfaction with a convergence guarantee. We propose a new surrogate function and provide worst-case constraint violation and approximation error. In practice, we propose an effective smooth penalty function, which can be easily implemented with a first-order optimizer. Extensive experiments are conducted, showing that EPO outperforms the baselines in terms of policy performance and constraint satisfaction with a stable training process, particularly on complex tasks.

ICML Conference 2024 Conference Paper

OxyGenerator: Reconstructing Global Ocean Deoxygenation Over a Century with Deep Learning

  • Bin Lu 0005
  • Ze Zhao
  • Luyu Han
  • Xiaoying Gan
  • Yuntao Zhou
  • Lei Zhou 0016
  • Luoyi Fu
  • Xinbing Wang

Accurately reconstructing the global ocean deoxygenation over a century is crucial for assessing and protecting marine ecosystem. Existing expert-dominated numerical simulations fail to catch up with the dynamic variation caused by global warming and human activities. Besides, due to the high-cost data collection, the historical observations are severely sparse, leading to big challenge for precise reconstruction. In this work, we propose OxyGenerator, the first deep learning based model, to reconstruct the global ocean deoxygenation from 1920 to 2023. Specifically, to address the heterogeneity across large temporal and spatial scales, we propose zoning-varying graph message-passing to capture the complex oceanographic correlations between missing values and sparse observations. Additionally, to further calibrate the uncertainty, we incorporate inductive bias from dissolved oxygen (DO) variations and chemical effects. Compared with in-situ DO observations, OxyGenerator significantly outperforms CMIP6 numerical simulations, reducing MAPE by 38. 77%, demonstrating a promising potential to understand the “breathless ocean” in data-driven manner.

IJCAI Conference 2023 Conference Paper

Self-supervised Graph Disentangled Networks for Review-based Recommendation

  • Yuyang Ren
  • Haonan Zhang
  • Qi Li
  • Luoyi Fu
  • Xinbing Wang
  • Chenghu Zhou

User review data is considered as auxiliary information to alleviate the data sparsity problem and improve the quality of learned user/item or interaction representations in review-based recommender systems. However, existing methods usually model user-item interactions in a holistic manner and neglect the entanglement of the latent intents behind them, e. g. , price, quality, or appearance, resulting in suboptimal representations and reducing interpretability. In this paper, we propose a Self-supervised Graph Disentangled Networks for review-based recommendation (SGDN), to separately model the user-item interactions based on the latent factors through the textual review data. To this end, we first model the distributions of interactions over latent factors from both semantic information in review data and structural information in user-item graph data, forming several factor graphs. Then a factorized message passing mechanism is designed to learn disentangled user/item and interaction representations on the factor graphs. Finally, we set an intent-aware contrastive learning task to alleviate the sparsity issue and encourage disentanglement through dynamically identifying positive and negative samples based on the learned intent distributions. Empirical results over five benchmark datasets validate the superiority of SGDN over the state-of-the-art methods and the interpretability of learned intent factors.

AAAI Conference 2023 Conference Paper

Temporal Knowledge Graph Reasoning with Historical Contrastive Learning

  • Yi Xu
  • Junjie Ou
  • Hui Xu
  • Luoyi Fu

Temporal knowledge graph, serving as an effective way to store and model dynamic relations, shows promising prospects in event forecasting. However, most temporal knowledge graph reasoning methods are highly dependent on the recurrence or periodicity of events, which brings challenges to inferring future events related to entities that lack historical interaction. In fact, the current moment is often the combined effect of a small part of historical information and those unobserved underlying factors. To this end, we propose a new event forecasting model called Contrastive Event Network (CENET), based on a novel training framework of historical contrastive learning. CENET learns both the historical and non-historical dependency to distinguish the most potential entities that can best match the given query. Simultaneously, it trains representations of queries to investigate whether the current moment depends more on historical or non-historical events by launching contrastive learning. The representations further help train a binary classifier whose output is a boolean mask to indicate related entities in the search space. During the inference process, CENET employs a mask-based strategy to generate the final results. We evaluate our proposed model on five benchmark graphs. The results demonstrate that CENET significantly outperforms all existing methods in most metrics, achieving at least 8.3% relative improvement of Hits@1 over previous state-of-the-art baselines on event-based datasets.

TIST Journal 2022 Journal Article

Make More Connections: Urban Traffic Flow Forecasting with Spatiotemporal Adaptive Gated Graph Convolution Network

  • Bin Lu
  • Xiaoying Gan
  • Haiming Jin
  • Luoyi Fu
  • Xinbing Wang
  • Haisong Zhang

Urban traffic flow forecasting is a critical issue in intelligent transportation systems. Due to the complexity and uncertainty of urban road conditions, how to capture the dynamic spatiotemporal correlation and make accurate predictions is very challenging. In most of existing works, urban road network is often modeled as a fixed graph based on local proximity. However, such modeling is not sufficient to describe the dynamics of the road network and capture the global contextual information. In this paper, we consider constructing the road network as a dynamic weighted graph through attention mechanism. Furthermore, we propose to seek both spatial neighbors and semantic neighbors to make more connections between road nodes. We propose a novel Spatiotemporal Adaptive Gated Graph Convolution Network ( STAG-GCN ) to predict traffic conditions for several time steps ahead. STAG-GCN mainly consists of two major components: (1) multivariate self-attention Temporal Convolution Network ( TCN ) is utilized to capture local and long-range temporal dependencies across recent, daily-periodic and weekly-periodic observations; (2) mix-hop AG-GCN extracts selective spatial and semantic dependencies within multi-layer stacking through adaptive graph gating mechanism and mix-hop propagation mechanism. The output of different components are weighted fused to generate the final prediction results. Extensive experiments on two real-world large scale urban traffic dataset have verified the effectiveness, and the multi-step forecasting performance of our proposed models outperforms the state-of-the-art baselines.

AAAI Conference 2019 Short Paper

APRP: An Anonymous Propagation Method in Bitcoin Network

  • Yuhang Yao
  • Xiao Zeng
  • Tianyue Cao
  • Luoyi Fu
  • Xinbing Wang

Due to little attention given to anonymous protection against eavesdropping attacks in Bitcoin network, this paper initiatively proposes a solution to Bitcoin anonymization based on network structure. We first present a general adversarial network model for formulizing deanonymization attack, then present a novel propagation method APRP(Adaptive PageRank Propagation) that adopts PageRank as propagation delay factor and constantly adjusts PR-value of nodes to adapt to network dynamics. Experiments on both simulated and real Bitcoin networks confirm the superiority of APRP in terms of 20-50% performance enhancement under various deanonymization attacks.

AAAI Conference 2018 Short Paper

Dynamic Detection of Communities and Their Evolutions in Temporal Social Networks

  • Yaowei Huang
  • Jinghuan Shang
  • Bill Lin
  • Luoyi Fu
  • Xinbing Wang

In this paper, we propose a novel community detection model, which explores the dynamic community evolutions in temporal social networks by modeling temporal affiliation strength between users and communities. Instead of transforming dynamic networks into static networks, our model utilizes normal distribution to estimate the change of affiliation strength more concisely and comprehensively. Extensive quantitative and qualitative evaluation on large social network datasets show that our model achieves improvements in terms of prediction accuracy and reveals distinctive insight about evolutions of temporal social networks.

AAAI Conference 2016 Conference Paper

DRIMUX: Dynamic Rumor Influence Minimization with User Experience in Social Networks

  • Biao Wang
  • Ge Chen
  • Luoyi Fu
  • Li Song
  • Xinbing Wang
  • Xue Liu

Rumor blocking is a serious problem in large-scale social networks. Malicious rumors could cause chaos in society and hence need to be blocked as soon as possible after being detected. In this paper, we propose a model of dynamic rumor influence minimization with user experience (DRIMUX). Our goal is to minimize the influence of the rumor (i. e. , the number of users that have accepted and sent the rumor) by blocking a certain subset of nodes. A dynamic Ising propagation model considering both the global popularity and individual attraction of the rumor is presented based on realistic scenario. In addition, different from existing problems of in- fluence minimization, we take into account the constraint of user experience utility. Specifically, each node is assigned a tolerance time threshold. If the blocking time of each user exceeds that threshold, the utility of the network will decrease. Under this constraint, we then formulate the problem as a network inference problem with survival theory, and propose solutions based on maximum likelihood principle. Experiments are implemented based on large-scale real world networks and validate the effectiveness of our method.