Author name cluster

Haiming Jin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers

1 author row

TIST Journal 2025 Journal Article

Cost-aware Best Arm Identification in Stochastic Bandits

Zhida Qin
Wenhao Xue
Lu Zheng
Xiaoying Gan
Hongqiu Wu
Haiming Jin
Luoyi Fu

The best arm identification problem in multi-armed bandit model has been widely applied into many practical applications, such as spectrum sensing, online advertising, and cloud computing. Although lots of works have been devoted into this area, most of them do not consider the cost of pulling actions, i.e., a player has to pay some cost when she pulls an arm. Motivated by this, we study a ratio-based best arm identification problem, where each arm is associated with a random reward as well as a random cost. For any \(\delta\in(0,1)\), with probability at least \(1-\delta\), the player aims to find the arm with the largest ratio of expected reward to expected cost using as few samplings as possible. Specifically, we consider two settings: (1) the precise setting, i.e., identifying the precise optimal one; (2) the Probably Approximate Correct (PAC) setting, which identifies the \(\epsilon\) -optimal one. For the precise setting, we design the elimination-type algorithms and provide a fundamental lower bound which asymptotically matches the upper bound, while in the PAC setting, an UCB-type algorithm which amed \(\epsilon\) -RCB algorithm is proposed. We show that for all algorithms, the sample complexities, i.e., the pulling times for all arms, grow logarithmically as \(\frac{1}{\delta}\) increases. Moreover, compared to existing works, the running of our algorithms is independent of the arm-related parameters, which is more practical. Finally, we validate our theoretical results through numerical experiments.

Details DOI

NeurIPS Conference 2024 Conference Paper

CondTSF: One-line Plugin of Dataset Condensation for Time Series Forecasting

Jianrong Ding
Zhanyu Liu
Guanjie Zheng
Haiming Jin
Linghe Kong

\textit{Dataset condensation} is a newborn technique that generates a small dataset that can be used in training deep neural networks (DNNs) to lower storage and training costs. The objective of dataset condensation is to ensure that the model trained with the synthetic dataset can perform comparably to the model trained with full datasets. However, existing methods predominantly concentrate on classification tasks, posing challenges in their adaptation to time series forecasting (TS-forecasting). This challenge arises from disparities in the evaluation of synthetic data. In classification, the synthetic data is considered well-distilled if the model trained with the full dataset and the model trained with the synthetic dataset yield identical labels for the same input, regardless of variations in output logits distribution. Conversely, in TS-forecasting, the effectiveness of synthetic data distillation is determined by the distance between predictions of the two models. The synthetic data is deemed well-distilled only when all data points within the predictions are similar. Consequently, TS-forecasting has a more rigorous evaluation methodology compared to classification. To mitigate this gap, we theoretically analyze the optimization objective of dataset condensation for TS-forecasting and propose a new one-line plugin of dataset condensation for TS-forecasting designated as Dataset \textbf{Cond}ensation for \textbf{T}ime \textbf{S}eries \textbf{F}orecasting (CondTSF) based on our analysis. Plugging CondTSF into previous dataset condensation methods facilitates a reduction in the distance between the predictions of the model trained with the full dataset and the model trained with the synthetic dataset, thereby enhancing performance. We conduct extensive experiments on eight commonly used time series datasets. CondTSF consistently improves the performance of all previous dataset condensation methods across all datasets, particularly at low condensing ratios.

PDF Details DOI

AAMAS Conference 2024 Conference Paper

Risk-Aware Constrained Reinforcement Learning with Non-Stationary Policies

Zhaoxing Yang
Haiming Jin
Yao Tang
Guiyun Fan

Constrained reinforcement learning (RL) algorithms have attracted extensive attentions nowadays to tackle sequential decision-making problems that contain constraints defined under various risk measures. However, most works only search policies within the stationary policy class and fail to capture a simple intuition: adjust the action-selecting distribution at each state according to the accumulated cost so far. In this work, we design a novel quantile-leveldriven policy class to fully realize such intuition, within which each policy additionally takes the quantile level of the accumulated cost as input. Such quantile level is obtained via a novel Invertible Backward Distributional Critic (IBDC) framework, which utilizes invertible function approximators to estimate the accumulated cost distribution and outputs the required quantile level with their inverse forms. Further, the estimated accumulated cost distribution also helps to decompose the challenging trajectory-level constraints into state-level constraints, and Risk-Aware Constrained RL (RAC) algorithm is designed then to solve the decomposed problem with Lagrangian multipliers. Experimental results in various environments validate the effectiveness of RAC versus state-of-the-art baselines.

PDF

AAAI Conference 2023 Conference Paper

DeCOM: Decomposed Policy for Constrained Cooperative Multi-Agent Reinforcement Learning

Zhaoxing Yang
Haiming Jin
Rong Ding
Haoyi You
Guiyun Fan
Xinbing Wang
Chenghu Zhou

In recent years, multi-agent reinforcement learning (MARL) has presented impressive performance in various applications. However, physical limitations, budget restrictions, and many other factors usually impose constraints on a multi-agent system (MAS), which cannot be handled by traditional MARL frameworks. Specifically, this paper focuses on constrained MASes where agents work cooperatively to maximize the expected team-average return under various constraints on expected team-average costs, and develops a constrained cooperative MARL framework, named DeCOM, for such MASes. In particular, DeCOM decomposes the policy of each agent into two modules, which empowers information sharing among agents to achieve better cooperation. In addition, with such modularization, the training algorithm of DeCOM separates the original constrained optimization into an unconstrained optimization on reward and a constraints satisfaction problem on costs. DeCOM then iteratively solves these problems in a computationally efficient manner, which makes DeCOM highly scalable. We also provide theoretical guarantees on the convergence of DeCOM's policy update algorithm. Finally, we conduct extensive experiments to show the effectiveness of DeCOM with various types of costs in both moderate-scale and large-scale (with 500 agents) environments that originate from real-world applications.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

Prediction with Incomplete Data under Agnostic Mask Distribution Shift

Yichen Zhu
Jian Yuan
Bo Jiang
Tao Lin
Haiming Jin
Xinbing Wang
Chenghu Zhou

Data with missing values is ubiquitous in many applications. Recent years have witnessed increasing attention on prediction with only incomplete data consisting of observed features and a mask that indicates the missing pattern. Existing methods assume that the training and testing distributions are the same, which may be violated in real-world scenarios. In this paper, we consider prediction with incomplete data in the presence of distribution shift. We focus on the case where the underlying joint distribution of complete features and label is invariant, but the missing pattern, i. e. , mask distribution may shift agnostically between training and testing. To achieve generalization, we leverage the observation that for each mask, there is an invariant optimal predictor. To avoid the exponential explosion when learning them separately, we approximate the optimal predictors jointly using a double parameterization technique. This has the undesirable side effect of allowing the learned predictors to rely on the intra-mask correlation and that between features and mask. We perform decorrelation to minimize this effect. Combining the techniques above, we propose a novel prediction method called StableMiss. Extensive experiments on both synthetic and real-world datasets show that StableMiss is robust and outperforms state-of-the-art methods under agnostic mask distribution shift.

PDF Details DOI

AAAI Conference 2023 Conference Paper

User-Oriented Robust Reinforcement Learning

Haoyi You
Beichen Yu
Haiming Jin
Zhaoxing Yang
Jiahui Sun

Recently, improving the robustness of policies across different environments attracts increasing attention in the reinforcement learning (RL) community. Existing robust RL methods mostly aim to achieve the max-min robustness by optimizing the policy’s performance in the worst-case environment. However, in practice, a user that uses an RL policy may have different preferences over its performance across environments. Clearly, the aforementioned max-min robustness is oftentimes too conservative to satisfy user preference. Therefore, in this paper, we integrate user preference into policy learning in robust RL, and propose a novel User-Oriented Robust RL (UOR-RL) framework. Specifically, we define a new User-Oriented Robustness (UOR) metric for RL, which allocates different weights to the environments according to user preference and generalizes the max-min robustness metric. To optimize the UOR metric, we develop two different UOR-RL training algorithms for the scenarios with or without a priori known environment distribution, respectively. Theoretically, we prove that our UOR-RL training algorithms converge to near-optimal policies even with inaccurate or completely no knowledge about the environment distribution. Furthermore, we carry out extensive experimental evaluations in 6 MuJoCo tasks. The experimental results demonstrate that UOR-RL is comparable to the state-of-the-art baselines under the average-case and worst-case performance metrics, and more importantly establishes new state-of-the-art performance under the UOR metric.

PDF Details DOI

TIST Journal 2022 Journal Article

Make More Connections: Urban Traffic Flow Forecasting with Spatiotemporal Adaptive Gated Graph Convolution Network

Bin Lu
Xiaoying Gan
Haiming Jin
Luoyi Fu
Xinbing Wang
Haisong Zhang

Urban traffic flow forecasting is a critical issue in intelligent transportation systems. Due to the complexity and uncertainty of urban road conditions, how to capture the dynamic spatiotemporal correlation and make accurate predictions is very challenging. In most of existing works, urban road network is often modeled as a fixed graph based on local proximity. However, such modeling is not sufficient to describe the dynamics of the road network and capture the global contextual information. In this paper, we consider constructing the road network as a dynamic weighted graph through attention mechanism. Furthermore, we propose to seek both spatial neighbors and semantic neighbors to make more connections between road nodes. We propose a novel Spatiotemporal Adaptive Gated Graph Convolution Network ( STAG-GCN ) to predict traffic conditions for several time steps ahead. STAG-GCN mainly consists of two major components: (1) multivariate self-attention Temporal Convolution Network ( TCN ) is utilized to capture local and long-range temporal dependencies across recent, daily-periodic and weekly-periodic observations; (2) mix-hop AG-GCN extracts selective spatial and semantic dependencies within multi-layer stacking through adaptive graph gating mechanism and mix-hop propagation mechanism. The output of different components are weighted fused to generate the final prediction results. Extensive experiments on two real-world large scale urban traffic dataset have verified the effectiveness, and the multi-step forecasting performance of our proposed models outperforms the state-of-the-art baselines.

Details DOI