Arrow Research search

Author name cluster

Pan Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

48 papers
1 author row

Possible papers

48

AAAI Conference 2026 Conference Paper

Matching Policy Design for Gig Platforms with “Priority” Features

  • Evan Yifan Xu
  • Pan Xu

In recent years, gig platforms like Uber and DoorDash have implemented strategies to boost gig drivers' earnings during peak hours. Uber's 'back-to-back' feature allows drivers to accept new trips while still on route, and Uber Eats' 'Batch Order Route' initiative allows drivers to pick up multiple deliveries from different locations, which may result in multiple tops before one order is delivered. Despite revenue gains, these features lead to user complaints about extended waiting times. In response, platforms introduce features like Uber Eats' 'Priority Delivery' and Uber's 'Priority', where customers pay an extra subscription fee for guaranteed reduced waiting times. This paper focuses on designing matching policies to enhance system revenue while limiting customer waiting times. We present a hybrid model combining online matching and queue theory for quantitative analysis of users' waiting times. Additionally, we introduce an LP-based sampling framework and a unified queue-theory-based method for evaluating online performance. Comprehensive experiments on real datasets validate our theoretical findings, highlighting the efficiency of our matching framework in promoting profit and meeting committed waiting times.

TMLR Journal 2026 Journal Article

Return Augmented Decision Transformer for Off-Dynamics Reinforcement Learning

  • Ruhan Wang
  • Yu Yang
  • Zhishuai Liu
  • Dongruo Zhou
  • Pan Xu

We study offline off-dynamics reinforcement learning (RL) to utilize data from an easily accessible source domain to enhance policy learning in a target domain with limited data. Our approach centers on return-conditioned supervised learning (RCSL), particularly focusing on Decision Transformer (DT) type frameworks, which can predict actions conditioned on desired return guidance and complete trajectory history. Previous works address the dynamics shift problem by augmenting the reward in the trajectory from the source domain to match the optimal trajectory in the target domain. However, this strategy can not be directly applicable in RCSL owing to (1) the unique form of the RCSL policy class, which explicitly depends on the return, and (2) the absence of a straightforward representation of the optimal trajectory distribution. We propose the Return Augmented (REAG) method for DT type frameworks, where we augment the return in the source domain by aligning its distribution with that in the target domain. We provide the theoretical analysis demonstrating that the RCSL policy learned from REAG achieves the same level of suboptimality as would be obtained without a dynamics shift. We introduce two practical implementations $REAG^{∗}_{Dara}$ and $REAG^{∗}_{MV}$ respectively. Thorough experiments on D4RL datasets and various DT-type baselines demonstrate that our methods consistently enhance the performance of DT type frameworks in off-dynamics RL.

JAIR Journal 2025 Journal Article

A New Regret-analysis Framework for Budgeted Multi-Armed Bandits

  • Evan Yifan Xu
  • Pan Xu

We consider two versions of the (stochastic) budgeted Multi-Armed Bandit problem. The first one was introduced by Tran-Thanh et al. (AAAI, 2012): Pulling each arm incurs a fixed deterministic cost and yields a random reward i.i.d. sampled from an unknown distribution (prior free). We have a global budget B and aim to devise a strategy to maximize the expected total reward. The second one was introduced by Ding et al. (AAAI, 2013): It has the same setting as before except costs of each arm are i.i.d. samples from an unknown distribution (and independent from its rewards). We propose a new budget-based regret-analysis framework and design two simple algorithms to illustrate the power of our framework. Our regret bounds for both problems not only match the optimal bound of O(ln B) but also significantly reduce the dependence on other input parameters (assumed constants), compared with the two studies of Tran-Thanh et al. (AAAI, 2012) and Ding et al. (AAAI, 2013) where both utilized a time-based framework. Extensive experimental results show the effectiveness and computation efficiency of our proposed algorithms and confirm our theoretical predictions.

NeurIPS Conference 2025 Conference Paper

Linear Mixture Distributionally Robust Markov Decision Processes

  • Zhishuai Liu
  • Pan Xu

Many real-world decision-making problems face the off-dynamics challenge: the agent learns a policy in a source domain and deploys it in a target domain with different state transitions. The distributionally robust Markov decision process (DRMDP) addresses this challenge by finding a robust policy that performs well under the worst-case environment within a pre-specified uncertainty set of transition dynamics. Its effectiveness heavily hinges on the proper design of these uncertainty sets, based on prior knowledge of the dynamics. In this work, we propose a novel linear mixture DRMDP framework, where the nominal dynamics is assumed to be a linear mixture model. In contrast with existing uncertainty sets directly defined as a ball centered around the nominal kernel, linear mixture DRMDPs define the uncertainty sets based on a ball around the mixture weighting parameter. We show that this new framework provides a more refined representation of uncertainties compared to conventional models based on $(s, a)$-rectangularity and $d$-rectangularity, when prior knowledge about the mixture model is present. We propose a meta algorithm for robust policy learning in linear mixture DRMDPs with general $f$-divergence defined uncertainty sets, and analyze its sample complexities under three divergence metrics instantiations: total variation, Kullback-Leibler, and $\chi^2$ divergences. These results establish the statistical learnability of linear mixture DRMDPs, laying the theoretical foundation for future research on this new setting.

JAIR Journal 2025 Journal Article

Optimizing Relevance and Diversity in Online Matching Markets: A Time-Adaptive Attenuation Approach

  • Evan Yifan Xu
  • Pan Xu

Real-world online matching markets (OMMs) often involve multiple objectives, such as maximizing relevance and diversity in online recommendation and crowdsourcing systems. In this paper, we propose a generic bi-objective maximization model for OMMs with the following features: (1) there are two types of agents—offline and online—with online agents arriving dynamically and stochastically; (2) upon each online agent’s arrival, an immediate and irrevocable decision must be made regarding which subset of relevant offline agents to assign; and (3) each offline and online agent has a specific matching capacity, i.e., an upper bound on the number of allowable matchings. Our model supports two general linear objective functions defined over all possible assignments to online agents. We formulate a bi-objective linear program (LP) and design an LP-based parameterized algorithm. Departing from prevalent non-adaptive attenuation methods, we introduce a time-adaptive attenuation framework that achieves an almost tight competitive ratio for each objective. To complement our theoretical analysis, we implement the proposed algorithm and evaluate it against several heuristics using two real-world datasets. Extensive experimental results demonstrate the flexibility and effectiveness of our approach, validating our theoretical predictions.

TMLR Journal 2025 Journal Article

Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer

  • Yu Yang
  • Pan Xu

Decision Transformer (DT) has emerged as a promising class of algorithms in offline reinforcement learning (RL) tasks, leveraging pre-collected datasets and Transformer's capability to model long sequences. Recent works have demonstrated that using parts of trajectories from training tasks as prompts in DT enhances its performance on unseen tasks, giving rise to Prompt-DT methods. However, collecting data from specific environments can be both costly and unsafe in many scenarios, leading to suboptimal performance and limited few-shot prompt abilities due to the data-hungry nature of Transformer-based models. Additionally, the limited datasets used in pre-training make it challenging for Prompt-DT type of methods to distinguish between various RL tasks through prompts alone. To address these challenges, we introduce the Language model-initialized Prompt Decision Transformer (LPDT) framework, which leverages pretrained language models providing rich prior knowledge for RL tasks and fine-tunes the sequence model using Low-rank Adaptation (LoRA) for meta-RL problems. We further incorporate prompt regularization to effectively differentiate between tasks based on prompt feature representations. Comprehensive empirical studies demonstrate that initializing with a pre-trained language model provides the prior knowledge and achieves a similar performance with Prompt-DT under only $10\%$ data in some MuJoCo control tasks. We also provide a thorough ablation study to validate the effectiveness of each component, including sequence modeling, language models, prompt regularizations, and prompt strategies.

IJCAI Conference 2024 Conference Paper

Design a Win-Win Strategy That Is Fair to Both Service Providers and Tasks When Rejection Is Not an Option

  • Yohai Trabelsi
  • Pan Xu
  • Sarit Kraus

Assigning tasks to service providers is a frequent procedure across various applications. Often the tasks arrive dynamically while the service providers remain static. Preventing task rejection caused by service provider overload is of utmost significance. To ensure a positive experience in relevant applications for both service providers and tasks, fairness must be considered. To address the issue, we model the problem as an online matching within a bipartite graph and tackle two minimax problems: one focuses on minimizing the highest waiting time of a task, while the other aims to minimize the highest workload of a service provider. We show that the second problem can be expressed as a linear program and thus solved efficiently while maintaining a reasonable approximation to the objective of the first problem. We developed novel methods that utilize the two minimax problems. We conducted extensive simulation experiments using real data and demonstrated that our novel heuristics, based on the linear program, performed remarkably well.

JAIR Journal 2024 Journal Article

Exploring the Tradeoff Between System Profit and Income Equality Among Ride-hailing Drivers

  • Evan Yifan Xu
  • Pan Xu

This paper examines the income inequality among rideshare drivers resulting from discriminatory cancellations by riders, considering the impact of demographic factors such as gender, age, and race. We investigate the tradeoff between income inequality, referred to as the fairness objective, and system efficiency, known as the profit objective. To address this issue, we propose an online bipartite-matching model that captures the sequential arrival of riders according to a known distribution. The model incorporates the notion of acceptance rates between driver-rider types, which are defined based on demographic characteristics. Specifically, we analyze the probabilities of riders accepting or canceling their assigned drivers, reflecting the level of acceptance between different rider and driver types. We construct a bi-objective linear program as a valid benchmark and propose two LP-based parameterized online algorithms. Rigorous analysis of online competitive ratios is conducted to illustrate the flexibility and efficiency of our algorithms in achieving a balance between fairness and profit. Furthermore, we present experimental results based on real-world and synthetic datasets, validating the theoretical predictions put forth in our study.

AAAI Conference 2024 Conference Paper

Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs

  • Tianyuan Jin
  • Hao-Lun Hsu
  • William Chang
  • Pan Xu

We study the multi-agent multi-armed bandit (MAMAB) problem, where agents are factored into overlapping groups. Each group represents a hyperedge, forming a hypergraph over the agents. At each round of interaction, the learner pulls a joint arm (composed of individual arms for each agent) and receives a reward according to the hypergraph structure. Specifically, we assume there is a local reward for each hyperedge, and the reward of the joint arm is the sum of these local rewards. Previous work introduced the multi-agent Thompson sampling (MATS) algorithm and derived a Bayesian regret bound. However, it remains an open problem how to derive a frequentist regret bound for Thompson sampling in this multi-agent setting. To address these issues, we propose an efficient variant of MATS, the epsilon-exploring Multi-Agent Thompson Sampling (eps-MATS) algorithm, which performs MATS exploration with probability epsilon while adopts a greedy policy otherwise. We prove that eps-MATS achieves a worst-case frequentist regret bound that is sublinear in both the time horizon and the local arm size. We also derive a lower bound for this setting, which implies our frequentist regret upper bound is optimal up to constant and logarithm terms, when the hypergraph is sufficiently sparse. Thorough experiments on standard MAMAB problems demonstrate the superior performance and the improved computational efficiency of eps-MATS compared with existing algorithms in the same setting.

NeurIPS Conference 2024 Conference Paper

Minimax Optimal and Computationally Efficient Algorithms for Distributionally Robust Offline Reinforcement Learning

  • Zhishuai Liu
  • Pan Xu

Distributionally robust offline reinforcement learning (RL), which seeks robust policy training against environment perturbation by modeling dynamics uncertainty, calls for function approximations when facing large state-action spaces. However, the consideration of dynamics uncertainty introduces essential nonlinearity and computational burden, posing unique challenges for analyzing and practically employing function approximation. Focusing on a basic setting where the nominal model and perturbed models are linearly parameterized, we propose minimax optimal and computationally efficient algorithms realizing function approximation and initiate the study on instance-dependent suboptimality analysis in the context of robust offline RL. Our results uncover that function approximation in robust offline RL is essentially distinct from and probably harder than that in standard offline RL. Our algorithms and theoretical results crucially depend on a novel function approximation mechanism incorporating variance information, a new procedure of suboptimality and estimation uncertainty decomposition, a quantification of the robust value function shrinkage, and a meticulously designed family of hard instances, which might be of independent interest.

RLJ Journal 2024 Journal Article

More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling

  • Haque Ishfaq
  • Yixin Tan
  • Yu Yang
  • Qingfeng Lan
  • Jianfeng Lu
  • A. Rupam Mahmood
  • Doina Precup
  • Pan Xu

Thompson sampling (TS) is one of the most popular exploration techniques in reinforcement learning (RL). However, most TS algorithms with theoretical guarantees are difficult to implement and not generalizable to Deep RL. While approximate sampling-based exploration schemes are promising, most existing algorithms are specific to linear Markov Decision Processes (MDP) with suboptimal regret bounds, or only use the most basic samplers such as Langevin Monte Carlo. In this work, we propose an algorithmic framework that incorporates different approximate sampling methods with the recently proposed Feel-Good Thompson Sampling (FGTS) approach (Zhang, 2022; Dann et al., 2021), which was previously known to be intractable. When applied to linear MDPs, our regret analysis yields the best known dependency of regret on dimensionality, surpassing existing randomized algorithms. Additionally, we provide explicit sampling complexity for each employed sampler. Empirically, we show that in tasks where deep exploration is necessary, our proposed algorithms that combine FGTS and approximate sampling perform significantly better compared to other strong baselines. On several challenging games from the Atari 57 suite, our algorithms achieve performance that is either better than or on par with other strong baselines from the deep RL literature.

RLC Conference 2024 Conference Paper

More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling

  • Haque Ishfaq
  • Yixin Tan
  • Yu Yang
  • Qingfeng Lan
  • Jianfeng Lu
  • A. Rupam Mahmood
  • Doina Precup
  • Pan Xu

Thompson sampling (TS) is one of the most popular exploration techniques in reinforcement learning (RL). However, most TS algorithms with theoretical guarantees are difficult to implement and not generalizable to Deep RL. While approximate sampling-based exploration schemes are promising, most existing algorithms are specific to linear Markov Decision Processes (MDP) with suboptimal regret bounds, or only use the most basic samplers such as Langevin Monte Carlo. In this work, we propose an algorithmic framework that incorporates different approximate sampling methods with the recently proposed Feel-Good Thompson Sampling (FGTS) approach (Zhang, 2022; Dann et al. , 2021), which was previously known to be intractable. When applied to linear MDPs, our regret analysis yields the best known dependency of regret on dimensionality, surpassing existing randomized algorithms. Additionally, we provide explicit sampling complexity for each employed sampler. Empirically, we show that in tasks where deep exploration is necessary, our proposed algorithms that combine FGTS and approximate sampling perform significantly better compared to other strong baselines. On several challenging games from the Atari 57 suite, our algorithms achieve performance that is either better than or on par with other strong baselines from the deep RL literature.

NeurIPS Conference 2024 Conference Paper

Off-Dynamics Reinforcement Learning via Domain Adaptation and Reward Augmented Imitation

  • Yihong Guo
  • Yixuan Wang
  • Yuanyuan Shi
  • Pan Xu
  • Anqi Liu

Training a policy in a source domain for deployment in the target domain under a dynamics shift can be challenging, often resulting in performance degradation. Previous work tackles this challenge by training on the source domain with modified rewards derived by matching distributions between the source and the target optimal trajectories. However, pure modified rewards only ensure the behavior of the learned policy in the source domain resembles trajectories produced by the target optimal policies, which does not guarantee optimal performance when the learned policy is actually deployed to the target domain. In this work, we propose to utilize imitation learning to transfer the policy learned from the reward modification to the target domain so that the new policy can generate the same trajectories in the target domain. Our approach, Domain Adaptation and Reward Augmented Imitation Learning (DARAIL), utilizes the reward modification for domain adaptation and follows the general framework of generative adversarial imitation learning from observation (GAIfO) by applying a reward augmented estimator for the policy optimization step. Theoretically, we present an error bound for our method under a mild assumption regarding the dynamics shift to justify the motivation of our method. Empirically, our method outperforms the pure modified reward method without imitation learning and also outperforms other baselines in benchmark off-dynamics environments.

NeurIPS Conference 2024 Conference Paper

Optimal Batched Best Arm Identification

  • Tianyuan Jin
  • Yu Yang
  • Jing Tang
  • Xiaokui Xiao
  • Pan Xu

We study the batched best arm identification (BBAI) problem, where the learner's goal is to identify the best arm while switching the policy as less as possible. In particular, we aim to find the best arm with probability $1-\delta$ for some small constant $\delta>0$ while minimizing both the sample complexity (total number of arm pulls) and the batch complexity (total number of batches). We propose the three-batch best arm identification (Tri-BBAI) algorithm, which is the first batched algorithm that achieves the optimal sample complexity in the asymptotic setting (i. e. , $\delta\rightarrow 0$) and runs in $3$ batches in expectation. Based on Tri-BBAI, we further propose the almost optimal batched best arm identification (Opt-BBAI) algorithm, which is the first algorithm that achieves the near-optimal sample and batch complexity in the non-asymptotic setting (i. e. , $1/\delta$ is finite), while enjoying the same batch and sample complexity as Tri-BBAI when $\delta$ tends to zero. Moreover, in the non-asymptotic setting, the complexity of previous batch algorithms is usually conditioned on the event that the best arm is returned (with a probability of at least $1-\delta$), which is potentially unbounded in cases where a sub-optimal arm is returned. In contrast, the complexity of Opt-BBAI does not rely on such an event. This is achieved through a novel procedure that we design for checking whether the best arm is eliminated, which is of independent interest.

TMLR Journal 2024 Journal Article

Pre-trained Hypergraph Convolutional Neural Networks with Self-supervised Learning

  • Yihe Deng
  • Ruochi Zhang
  • Pan Xu
  • Jian Ma
  • Quanquan Gu

Hypergraphs are powerful tools for modeling complex interactions across various domains, including biomedicine. However, learning meaningful node representations from hypergraphs remains a challenge. Existing supervised methods often lack generalizability, thereby limiting their real-world applications. We propose a new method, Pre-trained Hypergraph Convolutional Neural Networks with Self-supervised Learning (PhyGCN), which leverages hypergraph structure for self-supervision to enhance node representations. PhyGCN introduces a unique training strategy that integrates variable hyperedge sizes with self-supervised learning, enabling improved generalization to unseen data. Applications on multi-way chromatin interactions and polypharmacy side-effects demonstrate the effectiveness of PhyGCN. As a generic framework for high-order interaction datasets with abundant unlabeled data, PhyGCN holds strong potential for enhancing hypergraph node representations across various domains.

NeurIPS Conference 2024 Conference Paper

Promoting Fairness Among Dynamic Agents in Online-Matching Markets under Known Stationary Arrival Distributions

  • Will Ma
  • Pan Xu

Online (bipartite) matching under known stationary arrivals is a fundamental model that has been studied extensively under the objective of maximizing the total number of customers served. We instead study the objective of *maximizing the minimum matching rate across all online types*, which is referred to as long-run (individual) fairness. For Online Matching under long-run Fairness (OM-LF) with a single offline agent, we show that the first-come-first-serve (FCFS) policy is $1$-competitive, i. e. , matching any optimal clairvoyant. For the general case of OM-LF: We present a sampling algorithm (SAMP) and show that (1) SAMP is of competitiveness of at least $1-1/e$ and (2) it is asymptotically optimal with competitiveness approaches one in different regimes when either all offline agents have a sufficiently large matching capacity, or all online types have a sufficiently large arrival rate, or highly imbalance between the total offline matching capacity and the number of online arrivals. To complement the competitive results, we show the following hardness results for OM-LF: (1) Any non-rejecting policy (matching every arriving online agent if possible) is no more than $1/2$-competitive; (2) Any (randomized) policy is no more than $(\sqrt{3}-1)$-competitive; (3) SAMP can be no more than $(1-1/e)$-competitive suggesting the tightness of competitive analysis for SAMP. We stress that all hardness results mentioned here are independent of any benchmarks. We also consider a few extensions of OM-LF by proposing a few variants of fairness metrics, including long-run group-level fairness and short-run fairness, and we devise related algorithms with provable competitive performance.

NeurIPS Conference 2024 Conference Paper

Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning

  • Hao-Lun Hsu
  • Weixin Wang
  • Miroslav Pajic
  • Pan Xu

We present the first study on provably efficient randomized exploration in cooperative multi-agent reinforcement learning (MARL). We propose a unified algorithm framework for randomized exploration in parallel Markov Decision Processes (MDPs), and two Thompson Sampling (TS)-type algorithms, CoopTS-PHE and CoopTS-LMC, incorporating the perturbed-history exploration (PHE) strategy and the Langevin Monte Carlo exploration (LMC) strategy respectively, which are flexible in design and easy to implement in practice. For a special class of parallel MDPs where the transition is (approximately) linear, we theoretically prove that both CoopTS-PHE and CoopTS-LMC achieve a $\widetilde{\mathcal{O}}(d^{3/2}H^2\sqrt{MK})$ regret bound with communication complexity $\widetilde{\mathcal{O}}(dHM^2)$, where $d$ is the feature dimension, $H$ is the horizon length, $M$ is the number of agents, and $K$ is the number of episodes. This is the first theoretical result for randomized exploration in cooperative MARL. We evaluate our proposed method on multiple parallel RL environments, including a deep exploration problem (i. e. , $N$-chain), a video game, and a real-world problem in energy systems. Our experimental results support that our framework can achieve better performance, even under conditions of misspecified transition models. Additionally, we establish a connection between our unified framework and the practical application of federated learning.

TMLR Journal 2024 Journal Article

Wasserstein Distributionally Robust Policy Evaluation and Learning for Contextual Bandits

  • YI Shen
  • Pan Xu
  • Michael Zavlanos

Off-policy evaluation and learning are concerned with assessing a given policy and learning an optimal policy from offline data without direct interaction with the environment. Often, the environment in which the data are collected differs from the environment in which the learned policy is applied. To account for the effect of different environments during learning and execution, distributionally robust optimization (DRO) methods have been developed that compute worst-case bounds on the policy values assuming that the distribution of the new environment lies within an uncertainty set. Typically, this uncertainty set is defined based on the KL divergence around the empirical distribution computed from the logging dataset. However, the KL uncertainty set fails to encompass distributions with varying support and lacks awareness of the geometry of the distribution support. As a result, KL approaches fall short in addressing practical environment mismatches and lead to over-fitting to worst-case scenarios. To overcome these limitations, we propose a novel DRO approach that employs the Wasserstein distance instead. While Wasserstein DRO is generally computationally more expensive compared to KL DRO, we present a regularized method and a practical (biased) stochastic gradient descent method to optimize the policy efficiently. We also provide a theoretical analysis of the finite sample complexity and iteration complexity for our proposed method. We further validate our approach using a public dataset that was recorded in a randomized stoke trial.

AAMAS Conference 2023 Conference Paper

Allocation Problem in Remote Teleoperation: Online Matching with Offline Reusable Resources and Delayed Assignments

  • Osnat Ackerman Viden
  • Yohai Trabelsi
  • Pan Xu
  • Karthik Abinav Sankararaman
  • Oleg Maksimov
  • Sarit Kraus

Many applications where tasks should be assigned to agents can be modeled as matching in bipartite graphs. In this paper, we consider applications where tasks arrive dynamically and rejection of a task may have significant adverse effects on the requester, therefore performing the task with some delay is preferred over complete rejection. The performance time of a task depends on the task, the agent, and the assignment, and only its distribution is known in advance. The actual time is known only after the task performance when the agent is available for a new assignment. We consider such applications to be one of two arrival types. With the first type, the arrival distribution is known in advance, while there is no assumption about the arrival times and order with the second type. For the first type, we present an LP-based online algorithm with a competitive ratio of 0. 5. For the second type, we show no online algorithm with a constant competitive ratio. We run extensive experiments to evaluate our algorithm in a real-world dataset, demonstrating the advantages of the LP approach.

JBHI Journal 2023 Journal Article

Dynamics Combined With Hill Model for Functional Electrical Stimulation Ankle Angle Prediction

  • Xianghong Zhang
  • Ziqin Jiang
  • Xu Li
  • Pan Xu
  • Željka Lučev Vasić
  • Ivana Čuljak
  • Mario Cifrek
  • Min Du

Musculoskeletal models play an essential role in ankle rehabilitation research. The majority of the existing models have established the relationship between EMG and joint torque. However, EMG signal acquisition requires higher clinical conditions, such as sensitivity to external circumstances, motion artifacts and electrode position. To solve the nonlinear and time-varying nature of joint movement, a Functional Electrical Stimulation (FES) model was proposed in this study to simulate the whole process of ankle dorsiflexion. The model is combined with muscle contraction dynamics based on Hill model and ankle inverse dynamics to connect FES parameters, torques, and ankle angles. In addition, the extended Kalman filter (EKF) algorithm was applied to identify the unknown parameters of the model. Model validation experiment was performed by acquiring the actual data of healthy volunteers. Results showed that the root mean square error (RMSE) and normalized root mean square error (NRMSE) of this model were 11. 93%±0. 53% and 1. 39°±0. 26°, respectively, which means it can effectively predict the output variation of ankle joint angle while changing electrical stimulation parameters. Therefore, the proposed mode is essential for developing closed-loop feedback control of electrical stimulation and has the potential to help patients to conduct rehabilitation training.

AAAI Conference 2023 Conference Paper

Equity Promotion in Public Transportation

  • Anik Pramanik
  • Pan Xu
  • Yifan Xu

There are many news articles reporting the obstacles confronting poverty-stricken households in access to public transits. These barriers create a great deal of inconveniences for these impoverished families and more importantly, they contribute a lot of social inequalities. A typical approach addressing the issue is to build more transport infrastructure to offer more opportunities to access the public transits especially for those deprived communities. Examples include adding more bus lines connecting needy residents to railways systems and extending existing bus lines to areas with low socioeconomic status. Recently, a new strategy is proposed, which is to harness the ubiquitous ride-hailing services to connect disadvantaged households with the nearest public transportations. Compared with the former infrastructure-based solution, the ride-hailing-based strategy enjoys a few exclusive benefits such as higher effectiveness and more flexibility. In this paper, we propose an optimization model to study how to integrate the two approaches together for equity-promotion purposes. Specifically, we aim to design a strategy of allocating a given limited budget to different candidate programs such that the overall social equity is maximized, which is defined as the minimum covering ratio among all pre-specified protected groups of households (based on race, income, etc.). We have designed a linear-programming (LP) based rounding algorithm, which proves to achieve an optimal approximation ratio of 1-1/e. Additionally, we test our algorithm against a few baselines on real data assembled by outsourcing multiple public datasets collected in the city of Chicago. Experimental results confirm our theoretical predictions and demonstrate the effectiveness of our LP-based strategy in promoting social equity, especially when the budget is insufficient.

EWRL Workshop 2023 Workshop Paper

Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo

  • Haque Ishfaq
  • Qingfeng Lan
  • Pan Xu
  • A. Rupam Mahmood
  • Doina Precup
  • Anima Anandkumar
  • Kamyar Azizzadenesheli

We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL). One of the key shortcomings of existing Thompson sampling algorithms is the need to perform a Gaussian approximation of the posterior distribution, which is not a good surrogate in most practical settings. We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo, an efficient type of Markov Chain Monte Carlo (MCMC) method. Our method only needs to perform noisy gradient descent updates to learn the exact posterior distribution of the Q function, which makes our approach easy to deploy in deep RL. We provide a rigorous theoretical analysis for the proposed method and demonstrate that, in the linear Markov decision process (linear MDP) setting, it has a regret bound of $\tilde{O}(d^{3/2}H^{5/2}\sqrt{T})$, where $d$ is the dimension of the feature mapping, $H$ is the planning horizon, and $T$ is the total number of steps. We apply this approach to deep RL, by using Adam optimizer to perform gradient updates. Our approach achieves better or similar results compared with state-of-the-art deep RL algorithms on several challenging exploration tasks from the Atari57 suite.

NeurIPS Conference 2022 Conference Paper

Active Ranking without Strong Stochastic Transitivity

  • Hao Lou
  • Tao Jin
  • Yue Wu
  • Pan Xu
  • Quanquan Gu
  • Farzad Farnoud

Ranking from noisy comparisons is of great practical interest in machine learning. In this paper, we consider the problem of recovering the exact full ranking for a list of items under ranking models that do *not* assume the Strong Stochastic Transitivity property. We propose a $$\delta$$-correct algorithm, Probe-Rank, that actively learns the ranking of the items from noisy pairwise comparisons. We prove a sample complexity upper bound for Probe-Rank, which only depends on the preference probabilities between items that are adjacent in the true ranking. This improves upon existing sample complexity results that depend on the preference probabilities for all pairs of items. Probe-Rank thus outperforms existing methods over a large collection of instances that do not satisfy Strong Stochastic Transitivity. Thorough numerical experiments in various settings are conducted, demonstrating that Probe-Rank is significantly more sample-efficient than the state-of-the-art active ranking method.

AAAI Conference 2022 Conference Paper

Equity Promotion in Online Resource Allocation

  • Pan Xu
  • Yifan Xu

We consider online resource allocation under a typical nonprofit setting, where limited or even scarce resources are administered by a not-for-profit organization like a government. We focus on the internal-equity by assuming that arriving requesters are homogeneous in terms of their external factors like demands but heterogeneous for their internal attributes like demographics. Specifically, we associate each arriving requester with one or several groups based on their demographics (i. e. , race, gender, and age), and we aim to design an equitable distributing strategy such that every group of requesters can receive a fair share of resources proportional to a preset target ratio. We present two LP-based sampling algorithms and investigate them both theoretically (in terms of competitive-ratio analysis) and experimentally based on real COVID-19 vaccination data maintained by the Minnesota Department of Health. Both theoretical and numerical results show that our LP-based sampling strategies can effectively promote equity, especially when the arrival population is disproportionately represented, as observed in the early stage of the COVID-19 vaccine rollout.

NeurIPS Conference 2022 Conference Paper

Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits

  • Tianyuan Jin
  • Pan Xu
  • Xiaokui Xiao
  • Anima Anandkumar

We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family, which covers many common reward distributions including Bernoulli, Gaussian, Gamma, Exponential, etc. We propose a Thompson sampling algorithm, termed ExpTS, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm. We provide a tight regret analysis for ExpTS, which simultaneously yields both the finite-time regret bound as well as the asymptotic regret bound. In particular, for a $K$-armed bandit with exponential family rewards, ExpTS over a horizon $T$ is sub-UCB (a strong criterion for the finite-time regret that is problem-dependent), minimax optimal up to a factor $\sqrt{\log K}$, and asymptotically optimal, for exponential family rewards. Moreover, we propose ExpTS$^+$, by adding a greedy exploitation step in addition to the sampling distribution used in ExpTS, to avoid the over-estimation of sub-optimal arms. ExpTS$^+$ is an anytime bandit algorithm and achieves the minimax optimality and asymptotic optimality simultaneously for exponential family reward distributions. Our proof techniques are general and conceptually simple and can be easily applied to analyze standard Thompson sampling with specific reward distributions.

AAMAS Conference 2022 Conference Paper

Group-level Fairness Maximization in Online Bipartite Matching

  • Will Ma
  • Pan Xu
  • Yifan Xu

We consider the allocation of limited resources to heterogeneous customers who arrive in an online fashion. We would like to allocate the resources “fairly”, so that no group of customers is marginalized in terms of their overall service rate. We study whether this is possible to do so in an online fashion, and if so, what a good online allocation policy is. We model this problem using online bipartite matching under stationary arrivals, a fundamental model in the literature typically studied under the objective of maximizing the total number of customers served. We instead study the objective of maximizing the minimum service rate across all groups, and propose two notions of fairness: long-run and short-run. For these fairness objectives, we analyze how competitive online algorithms can be, in comparison to offline algorithms which know the sequence of demands in advance. For long-run fairness, we propose two online heuristics (Sampling and Pooling) which establish asymptotic optimality in different regimes (no specialized supplies, no rare demand types, or imbalanced supply/demand). By contrast, outside all of these regimes, we show that the competitive ratio of online algorithms is between 0. 632 and 0. 732. For short-run fairness, we show for complete bipartite graphs that the competitive ratio of online algorithms is between 0. 863 and 0. 942; we also derive a probabilistic rejection algorithm which is asymptotically optimal in the total demand. Depending on the overall scarcity of resources, either our Sampling or Pooling heuristics could be desirable. The most difficult situation for online allocation occurs when the total supply is just enough to serve the total demand, in which case an organization could try to make allocations offline instead. We simulate our algorithms on a public ride-hailing dataset, which both demonstrates the efficacy of our heuristics and validates our managerial insights.

AAMAS Conference 2022 Conference Paper

The Generalized Magician Problem under Unknown Distributions and Related Applications

  • Aravind Srinivasan
  • Pan Xu

The Magician Problem (MP) and its generalization, the Generalized Magician Problem (GMP), were introduced by Alaei et al. (APPROX- RANDOM 2013) and Alaei (SICOMP 2014) and have been used as powerful ingredients in online-algorithm design for many hard problems such as the k-choice prophet inequality, mechanism design in Bayesian combinatorial auctions, and the generalized assignment problem. The adversarial model here is essentially that of an oblivious adversary. In this paper, we introduce generalizations of GMP (MP) under two different arrival settings (by making the adversary stronger): unknown independent identical distributions (UIID) and unknown adversarial distributions (UAD). Different adversary models capture a range of arrival patterns. For GMP under UIID, we show that a natural greedy algorithm Greedy is optimal. For the case of MP under UIID, we show that Greedy has an optimal performance of 1 − BB B! eB ≥ 1 − 1 √ 2π B, where B is the budget, and show an application to online B-matching with stochastic rewards. For GMP under UAD, we present a simple algorithm, which is near-optimal among all non-adaptive algorithms. We consider the simple case of MP under UAD with B = 1, and give an exact characterization of the respective optimal adaptive and optimal non-adaptive algorithms for any finite time horizon. We offer an example of MP under UAD on which there is a provable gap between the classical MP under adversarial order and MP under UAD even with a time horizon T = 4.

NeurIPS Conference 2020 Conference Paper

A Finite-Time Analysis of Two Time-Scale Actor-Critic Methods

  • Yue Frank Wu
  • Weitong Zhang
  • Pan Xu
  • Quanquan Gu

Actor-critic (AC) methods have exhibited great empirical success compared with other reinforcement learning algorithms, where the actor uses the policy gradient to improve the learning policy and the critic uses temporal difference learning to estimate the policy gradient. Under the two time-scale learning rate schedule, the asymptotic convergence of AC has been well studied in the literature. However, the non-asymptotic convergence and finite sample complexity of actor-critic methods are largely open. In this work, we provide a non-asymptotic analysis for two time-scale actor-critic methods under non-i. i. d. setting. We prove that the actor-critic method is guaranteed to find a first-order stationary point (i. e. , $\|\nabla J(\bm{\theta})\|_2^2 \le \epsilon$) of the non-concave performance function $J(\bm{\theta})$, with $\mathcal{\tilde{O}}(\epsilon^{-2. 5})$ sample complexity. To the best of our knowledge, this is the first work providing finite-time analysis and sample complexity bound for two time-scale actor-critic methods.

IJCAI Conference 2020 Conference Paper

A Unified Model for the Two-stage Offline-then-Online Resource Allocation

  • Yifan Xu
  • Pan Xu
  • Jianping Pan
  • Jun Tao

With the popularity of the Internet, traditional offline resource allocation has evolved into a new form, called online resource allocation. It features the online arrivals of agents in the system and the real-time decision-making requirement upon the arrival of each online agent. Both offline and online resource allocation have wide applications in various real-world matching markets ranging from ridesharing to crowdsourcing. There are some emerging applications such as rebalancing in bike sharing and trip-vehicle dispatching in ridesharing, which involve a two-stage resource allocation process. The process consists of an offline phase and another sequential online phase, and both phases compete for the same set of resources. In this paper, we propose a unified model which incorporates both offline and online resource allocation into a single framework. Our model assumes non-uniform and known arrival distributions for online agents in the second online phase, which can be learned from historical data. We propose a parameterized linear programming (LP)-based algorithm, which is shown to be at most a constant factor of 1/4 from the optimal. Experimental results on the real dataset show that our LP-based approaches outperform the LP-agnostic heuristics in terms of robustness and effectiveness.

AAAI Conference 2020 Conference Paper

Balancing the Tradeoff between Profit and Fairness in Rideshare Platforms during High-Demand Hours

  • Vedant Nanda
  • Pan Xu
  • Karthik Abhinav Sankararaman
  • John Dickerson
  • Aravind Srinivasan

Rideshare platforms, when assigning requests to drivers, tend to maximize profit for the system and/or minimize waiting time for riders. Such platforms can exacerbate biases that drivers may have over certain types of requests. We consider the case of peak hours when the demand for rides is more than the supply of drivers. Drivers are well aware of their advantage during the peak hours and can choose to be selective about which rides to accept. Moreover, if in such a scenario, the assignment of requests to drivers (by the platform) is made only to maximize profit and/or minimize wait time for riders, requests of a certain type (e. g. , from a non-popular pickup location, or to a non-popular drop-off location) might never be assigned to a driver. Such a system can be highly unfair to riders. However, increasing fairness might come at a cost of the overall profit made by the rideshare platform. To balance these conflicting goals, we present a flexible, nonadaptive algorithm, NAdap, that allows the platform designer to control the profit and fairness of the system via parameters α and β respectively. We model the matching problem as an online bipartite matching where the set of drivers is of- fline and requests arrive online. Upon the arrival of a request, we use NAdap to assign it to a driver (the driver might then choose to accept or reject it) or reject the request. We formalize the measures of profit and fairness in our setting and show that by using NAdap, the competitive ratios for profit and fairness measures would be no worse than α/e and β/e respectively. Extensive experimental results on both real-world and synthetic datasets confirm the validity of our theoretical lower bounds. Additionally, they show that NAdap under some choice of (α, β) can beat two natural heuristics, Greedy and Uniform, on both fairness and profit. Code is available at: https: //github. com/nvedant07/rideshare-fairness-peak/.

AAAI Conference 2020 Conference Paper

Rank Aggregation via Heterogeneous Thurstone Preference Models

  • Tao Jin
  • Pan Xu
  • Quanquan Gu
  • Farzad Farnoud

We propose the Heterogeneous Thurstone Model (HTM) for aggregating ranked data, which can take the accuracy levels of different users into account. By allowing different noise distributions, the proposed HTM model maintains the generality of Thurstone’s original framework, and as such, also extends the Bradley-Terry-Luce (BTL) model for pairwise comparisons to heterogeneous populations of users. Under this framework, we also propose a rank aggregation algorithm based on alternating gradient descent to estimate the underlying item scores and accuracy levels of different users simultaneously from noisy pairwise comparisons. We theoretically prove that the proposed algorithm converges linearly up to a statistical error which matches that of the state-of-the-art method for the single-user BTL model. We evaluate the proposed HTM model and algorithm on both synthetic and real data, demonstrating that it outperforms existing methods.

JMLR Journal 2020 Journal Article

Stochastic Nested Variance Reduction for Nonconvex Optimization

  • Dongruo Zhou
  • Pan Xu
  • Quanquan Gu

We study nonconvex optimization problems, where the objective function is either an average of $n$ nonconvex functions or the expectation of some stochastic function. We propose a new stochastic gradient descent algorithm based on nested variance reduction, namely, Stochastic Nested Variance-Reduced Gradient descent ($\text{SNVRG}$). Compared with conventional stochastic variance reduced gradient ($\text{SVRG}$) algorithm that uses two reference points to construct a semi-stochastic gradient with diminishing variance in each iteration, our algorithm uses $K+1$ nested reference points to build a semi-stochastic gradient to further reduce its variance in each iteration. For smooth nonconvex functions, $\text{SNVRG}$ converges to an $\epsilon$-approximate first-order stationary point within $\tilde O(n\land\epsilon^{-2}+\epsilon^{-3}\land n^{1/2}\epsilon^{-2})$ number of stochastic gradient evaluations. This improves the best known gradient complexity of $\text{SVRG}$ $O(n+n^{2/3}\epsilon^{-2})$ and that of $\text{SCSG}$ $O(n\land \epsilon^{-2}+\epsilon^{-10/3}\land n^{2/3}\epsilon^{-2})$. For gradient dominated functions, $\text{SNVRG}$ also achieves better gradient complexity than the state-of-the-art algorithms. Based on $\text{SNVRG}$, we further propose two algorithms that can find local minima faster than state-of-the-art algorithms in both finite-sum and general stochastic (online) nonconvex optimization. In particular, for finite-sum optimization problems, the proposed $\text{SNVRG}+\text{Neon2}^{\text{finite}}$ algorithm achieves $\tilde{O}(n^{1/2}\epsilon^{-2}+n\epsilon_H^{-3}+n^{3/4}\epsilon_H^{-7/2})$ gradient complexity to converge to an $(\epsilon, \epsilon_H)$-second-order stationary point, which outperforms $\text{SVRG}+\text{Neon2}^{\text{finite}}$ (Allen-Zhu and Li, 2018), the best existing algorithm, in a wide regime. For general stochastic optimization problems, the proposed $\text{SNVRG}+\text{Neon2}^{\text{online}}$ achieves $\tilde{O}(\epsilon^{-3}+\epsilon_H^{-5}+\epsilon^{-2}\epsilon_H^{-3})$ gradient complexity, which is better than both $\text{SVRG}+\text{Neon2}^{\text{online}}$ (Allen-Zhu and Li, 2018) and $\text{Natasha2}$ (Allen-Zhu, 2018a) in certain regimes. Thorough experimental results on different nonconvex optimization problems back up our theory. [abs] [ pdf ][ bib ] &copy JMLR 2020. ( edit, beta )

IJCAI Conference 2020 Conference Paper

Trade the System Efficiency for the Income Equality of Drivers in Rideshare

  • Yifan Xu
  • Pan Xu

Several scientific studies have reported the existence of the income gap among rideshare drivers based on demographic factors such as gender, age, race, etc. In this paper, we study the income inequality among rideshare drivers due to discriminative cancellations from riders, and the tradeoff between the income inequality (called fairness objective) with the system efficiency (called profit objective). We proposed an online bipartite-matching model where riders are assumed to arrive sequentially following a distribution known in advance. The highlight of our model is the concept of acceptance rate between any pair of driver-rider types, where types are defined based on demographic factors. Specially, we assume each rider can accept or cancel the driver assigned to her, each occurs with a certain probability which reflects the acceptance degree from the rider type towards the driver type. We construct a bi-objective linear program as a valid benchmark and propose two LP-based parameterized online algorithms. Rigorous online competitive ratio analysis is offered to demonstrate the flexibility and efficiency of our online algorithms in balancing the two conflicting goals, promotions of fairness and profit. Experimental results on a real-world dataset are provided as well, which confirm our theoretical predictions.

AAAI Conference 2019 Conference Paper

A Unified Approach to Online Matching with Conflict-Aware Constraints

  • Pan Xu
  • Yexuan Shi
  • Hao Cheng
  • John Dickerson
  • Karthik Abinav Sankararaman
  • Aravind Srinivasan
  • Yongxin Tong
  • Leonidas Tsepenekas

Online bipartite matching and allocation models are widely used to analyze and design markets such as Internet advertising, online labor, and crowdsourcing. Traditionally, vertices on one side of the market are fixed and known a priori, while vertices on the other side arrive online and are matched by a central agent to the offline side. The issue of possible conflicts among offline agents emerges in various real scenarios when we need to match each online agent with a set of offline agents. For example, in event-based social networks (e. g. , Meetup), offline events conflict for some users since they will be unable to attend mutually-distant events at proximate times; in advertising markets, two competing firms may prefer not to be shown to one user simultaneously; and in online recommendation systems (e. g. , Amazon Books), books of the same type “conflict” with each other in some sense due to the diversity requirement for each online buyer. The conflict nature inherent among certain offline agents raises significant challenges in both modeling and online algorithm design. In this paper, we propose a unifying model, generalizing the conflict models proposed in (She et al. , TKDE 2016) and (Chen et al. , TKDE 16). Our model can capture not only a broad class of conflict constraints on the offline side (which is even allowed to be sensitive to each online agent), but also allows a general arrival pattern for the online side (which is allowed to change over the online phase). We propose an efficient linear programming (LP) based online algorithm and prove theoretically that it has nearly-optimal online performance. Additionally, we propose two LP-based heuristics and test them against two natural baselines on both real and synthetic datasets. Our LP-based heuristics experimentally dominate the baseline algorithms, aligning with our theoretical predictions and supporting our unified approach.

AAAI Conference 2019 Conference Paper

Balancing Relevance and Diversity in Online Bipartite Matching via Submodularity

  • John P. Dickerson
  • Karthik Abinav Sankararaman
  • Aravind Srinivasan
  • Pan Xu

In bipartite matching problems, vertices on one side of a bipartite graph are paired with those on the other. In its online variant, one side of the graph is available offline, while the vertices on the other side arrive online. When a vertex arrives, an irrevocable and immediate decision should be made by the algorithm; either match it to an available vertex or drop it. Examples of such problems include matching workers to firms, advertisers to keywords, organs to patients, and so on. Much of the literature focuses on maximizing the total relevance—modeled via total weight—of the matching. However, in many real-world problems, it is also important to consider contributions of diversity: hiring a diverse pool of candidates, displaying a relevant but diverse set of ads, and so on. In this paper, we propose the Online Submodular Bipartite Matching (OSBM) problem, where the goal is to maximize a submodular function f over the set of matched edges. This objective is general enough to capture the notion of both diversity (e. g. , a weighted coverage function) and relevance (e. g. , the traditional linear function)—as well as many other natural objective functions occurring in practice (e. g. , limited total budget in advertising settings). We propose novel algorithms that have provable guarantees and are essentially optimal when restricted to various special cases. We also run experiments on real-world and synthetic datasets to validate our algorithms.

AAMAS Conference 2019 Conference Paper

Online Resource Allocation with Matching Constraints

  • John P. Dickerson
  • Karthik Abinav Sankararaman
  • Kanthi Kiran Sarpatwar
  • Aravind Srinivasan
  • Kun-Lung Wu
  • Pan Xu

Matching markets with historical data are abundant in many applications, e. g. , matching candidates to jobs in hiring, workers to tasks in crowdsourcing markets, and jobs to servers in cloud services. In all these applications, a match consumes one or more shared and limited resources and the goal is to best utilize these to maximize a global objective. Additionally, one often has historical data and hence some statistics (usually first-order moments) of the arriving agents (e. g. , candidates, workers, and jobs) can be learnt. To model these scenarios, we propose a unifying framework, called Multi- Budgeted Online Assignment with Known Adversarial Distributions. In this model, we have a set of offline servers with different deadlines and a set of online job types. At each time, a job of type j arrives. Assigning this job to a server i yields a profit wi, j while consuming ae ∈ [0, 1]K quantities of distinct resources. The goal is to design an (online) assignment policy that maximizes the total expected profit without violating the (hard) budget constraint. We propose and theoretically analyze two linear programming (LP) based algorithms which are almost optimal among all LP-based approaches. We also propose several heuristics adapted from our algorithms and compare them to other LP-agnostic algorithms using both synthetic as well as real-time cloud scheduling and public safety datasets. Experimental results show that our proposed algorithms are effective and significantly out-perform the baselines. Moreover, we show empirically the trade-off between fairness and efficiency of our algorithms which does well even on fairness metrics without explicitly optimizing for it. ∗Part of this work is done when Pan Xu was an intern at the IBM T. J. Watson Research Center during the summer of 2016. Aravind Srinivasan’s research was supported in part by NSF Awards CNS-1010789, CCF-1422569 and CCF-1749864, and by research awards from Adobe, Inc. Karthik Sankararaman’s research was supported in part by NSF Awards CNS-1010789 and CCF-1422569. John Dickerson’s research was supported by NSF IIS RI CAREER Award #1846237. Pan Xu’s research was supported by NSF Awards CNS-1010789, CCF-1422569 and NSF IIS RI CAREER Award #1846237. The authors also like to thank Google for a generous gift support. We thank Aditya Parameswaran and Xingjie Liu for useful discussions on datasets related to our problem. Proc. of the 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2019), N. Agmon, M. E. Taylor, E. Elkind, M. Veloso (eds.), May 13–17, 2019, Montreal, Canada. © 2019 International Foundation for Autonomous Agents and Multiagent Systems (www. ifaamas. org). All rights reserved.

AAAI Conference 2019 Conference Paper

Preference-Aware Task Assignment in On-Demand Taxi Dispatching: An Online Stable Matching Approach

  • Boming Zhao
  • Pan Xu
  • Yexuan Shi
  • Yongxin Tong
  • Zimu Zhou
  • Yuxiang Zeng

A central issue in on-demand taxi dispatching platforms is task assignment, which designs matching policies among dynamically arrived drivers (workers) and passengers (tasks). Previous matching policies maximize the profit of the platform without considering the preferences of workers and tasks (e. g. , workers may prefer high-rewarding tasks while tasks may prefer nearby workers). Such ignorance of preferences impairs user experience and will decrease the profit of the platform in the long run. To address this problem, we propose preference-aware task assignment using online stable matching. Specifically, we define a new model, Online Stable Matching under Known Identical Independent Distributions (OSM-KIID). It not only maximizes the expected total profits (OBJ-1), but also tries to satisfy the preferences among workers and tasks by minimizing the expected total number of blocking pairs (OBJ-2). The model also features a practical arrival assumption validated on real-world dataset. Furthermore, we present a linear program based online algorithm LP-ALG, which achieves an online ratio of at least 1−1/e on OBJ-1 and has at most 0. 6·|E| blocking pairs expectedly, where |E| is the total number of edges in the compatible graph. We also show that a natural Greedy can have an arbitrarily bad performance on OBJ-1 while maintaining around 0. 5·|E| blocking pairs. Evaluations on both synthetic and real datasets confirm our theoretical analysis and demonstrate that LP-ALG strictly dominates all the baselines on both objectives when tasks notably outnumber workers.

NeurIPS Conference 2019 Conference Paper

Stochastic Gradient Hamiltonian Monte Carlo Methods with Recursive Variance Reduction

  • Difan Zou
  • Pan Xu
  • Quanquan Gu

Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) algorithms have received increasing attention in both theory and practice. In this paper, we propose a Stochastic Recursive Variance-Reduced gradient HMC (SRVR-HMC) algorithm. It makes use of a semi-stochastic gradient estimator that recursively accumulates the gradient information to reduce the variance of the stochastic gradient. We provide a convergence analysis of SRVR-HMC for sampling from a class of non-log-concave distributions and show that SRVR-HMC converges faster than all existing HMC-type algorithms based on underdamped Langevin dynamics. Thorough experiments on synthetic and real-world datasets validate our theory and demonstrate the superiority of SRVR-HMC.

JMLR Journal 2019 Journal Article

Stochastic Variance-Reduced Cubic Regularization Methods

  • Dongruo Zhou
  • Pan Xu
  • Quanquan Gu

We propose a stochastic variance-reduced cubic regularized Newton method (SVRC) for non-convex optimization. At the core of SVRC is a novel semi-stochastic gradient along with a semi-stochastic Hessian, which are specifically designed for cubic regularization method. For a nonconvex function with $n$ component functions, we show that our algorithm is guaranteed to converge to an $(\epsilon,\sqrt{\epsilon})$-approximate local minimum within $\tilde{O}(n^{4/5}/\epsilon^{3/2})$\footnote{Here $\tilde{O}$ hides poly-logarithmic factors.} second-order oracle calls, which outperforms the state-of-the-art cubic regularization algorithms including subsampled cubic regularization. To further reduce the sample complexity of Hessian matrix computation in cubic regularization based methods, we also propose a sample efficient stochastic variance-reduced cubic regularization (Lite-SVRC) algorithm for finding the local minimum more efficiently. Lite-SVRC converges to an $(\epsilon,\sqrt{\epsilon})$-approximate local minimum within $\tilde{O}(n+n^{2/3}/\epsilon^{3/2})$ Hessian sample complexity, which is faster than all existing cubic regularization based methods. Numerical experiments with different nonconvex optimization problems conducted on real datasets validate our theoretical results for both SVRC and Lite-SVRC. [abs] [ pdf ][ bib ] &copy JMLR 2019. ( edit, beta )

AAAI Conference 2018 Conference Paper

Allocation Problems in Ride-Sharing Platforms: Online Matching With Offline Reusable Resources

  • John Dickerson
  • Karthik Sankararaman
  • Aravind Srinivasan
  • Pan Xu

Bipartite matching markets pair agents on one side of a market with agents, items, or contracts on the opposing side. Prior work addresses online bipartite matching markets, where agents arrive over time and are dynamically matched to a known set of disposable resources. In this paper, we propose a new model, Online Matching with (offline) Reusable Resources under Known Adversarial Distributions (OM-RR-KAD), in which resources on the offline side are reusable instead of disposable; that is, once matched, resources become available again at some point in the future. We show that our model is tractable by presenting an LP-based adaptive algorithm that achieves an online competitive ratio of 1 2 − for any given > 0. We also show that no non-adaptive algorithm can achieve a ratio of 1 2 + o(1) based on the same benchmark LP. Through a data-driven analysis on a massive openly-available dataset, we show our model is robust enough to capture the application of taxi dispatching services and ridesharing systems. We also present heuristics that perform well in practice.

AAMAS Conference 2018 Conference Paper

Assigning Tasks to Workers based on Historical Data: Online Task Assignment with Two-sided Arrivals

  • John P. Dickerson
  • Karthik Abinav Sankararaman
  • Aravind Srinivasan
  • Pan Xu

Efficient allocation of tasks to workers is a central problem in crowdsourcing. In this paper, we consider a special setting inspired from spatial crowdsourcing platforms where both workers and tasks arrive dynamically. Additionally, we assume all tasks are heterogeneous and each worker-task assignment brings a distinct reward. The natural challenge lies in how to incorporate the uncertainty in the arrivals from both workers and tasks into our online allocation policy such that the total expected rewards are maximized. To attack this challenge, we assume the arrival patterns of worker “types” and task “types” are not erratic and can be predicted from historical data. To be more specific, we consider a finite time horizon T and assume in each time-step, a single worker and task are sampled (i. e. , “arrive”) from two respective distributions independently, and this sampling process repeats identically and independently for the entire T online time-steps. Our model, called Online Task Assignment with Two-Sided Arrival (OTA-TSA), is a significant generalization of the classical online task assignment where the set of tasks is assumed to be available offline. For the general version of OTA-TSA, we present an optimal non-adaptive algorithm which achieves an online competitive ratio of 0. 295. For the special case of OTA-TSA where the reward is a function of just the worker type, we present an improved algorithm (which is adaptive) and achieves a competitive ratio of at least 0. 343. On the hardness side, along with showing that the ratio obtained by our non-adaptive algorithm is the best possible among all nonadaptive algorithms, we further show that no (adaptive) algorithm can achieve a ratio better than 0. 581 (unconditionally), even for the special case of OTA-TSA with homogenous tasks (i. e. , all rewards are same). At the heart of our analysis lies a new technical tool (which is a refined notion of the birth-death process), called the two-stage birth-death process, which may be of independent interest. Finally, we perform numerical experiments on two real-world datasets obtained from crowdsourcing platforms to complement our theoretical results.

NeurIPS Conference 2018 Conference Paper

Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization

  • Pan Xu
  • Jinghui Chen
  • Difan Zou
  • Quanquan Gu

We present a unified framework to analyze the global convergence of Langevin dynamics based algorithms for nonconvex finite-sum optimization with $n$ component functions. At the core of our analysis is a direct analysis of the ergodicity of the numerical approximations to Langevin dynamics, which leads to faster convergence rates. Specifically, we show that gradient Langevin dynamics (GLD) and stochastic gradient Langevin dynamics (SGLD) converge to the \textit{almost minimizer}\footnote{Following \citet{raginsky2017non}, an almost minimizer is defined to be a point which is within the ball of the global minimizer with radius $O(d\log(\beta+1)/\beta)$, where $d$ is the problem dimension and $\beta$ is the inverse temperature parameter. } within $\tilde O\big(nd/(\lambda\epsilon) \big)$\footnote{$\tilde O(\cdot)$ notation hides polynomials of logarithmic terms and constants. } and $\tilde O\big(d^7/(\lambda^5\epsilon^5) \big)$ stochastic gradient evaluations respectively, where $d$ is the problem dimension, and $\lambda$ is the spectral gap of the Markov chain generated by GLD. Both results improve upon the best known gradient complexity\footnote{Gradient complexity is defined as the total number of stochastic gradient evaluations of an algorithm, which is the number of stochastic gradients calculated per iteration times the total number of iterations. } results \citep{raginsky2017non}. Furthermore, for the first time we prove the global convergence guarantee for variance reduced stochastic gradient Langevin dynamics (VR-SGLD) to the almost minimizer within $\tilde O\big(\sqrt{n}d^5/(\lambda^4\epsilon^{5/2})\big)$ stochastic gradient evaluations, which outperforms the gradient complexities of GLD and SGLD in a wide regime. Our theoretical analyses shed some light on using Langevin dynamics based algorithms for nonconvex optimization with provable guarantees.

NeurIPS Conference 2018 Conference Paper

Stochastic Nested Variance Reduction for Nonconvex Optimization

  • Dongruo Zhou
  • Pan Xu
  • Quanquan Gu

We study finite-sum nonconvex optimization problems, where the objective function is an average of $n$ nonconvex functions. We propose a new stochastic gradient descent algorithm based on nested variance reduction. Compared with conventional stochastic variance reduced gradient (SVRG) algorithm that uses two reference points to construct a semi-stochastic gradient with diminishing variance in each iteration, our algorithm uses $K+1$ nested reference points to build a semi-stochastic gradient to further reduce its variance in each iteration. For smooth nonconvex functions, the proposed algorithm converges to an $\epsilon$-approximate first-order stationary point (i. e. , $\|\nabla F(\mathbf{x})\|_2\leq \epsilon$) within $\tilde O(n\land \epsilon^{-2}+\epsilon^{-3}\land n^{1/2}\epsilon^{-2})$\footnote{$\tilde O(\cdot)$ hides the logarithmic factors, and $a\land b$ means $\min(a, b)$. } number of stochastic gradient evaluations. This improves the best known gradient complexity of SVRG $O(n+n^{2/3}\epsilon^{-2})$ and that of SCSG $O(n\land \epsilon^{-2}+\epsilon^{-10/3}\land n^{2/3}\epsilon^{-2})$. For gradient dominated functions, our algorithm also achieves better gradient complexity than the state-of-the-art algorithms. Thorough experimental results on different nonconvex optimization problems back up our theory.

NeurIPS Conference 2018 Conference Paper

Third-order Smoothness Helps: Faster Stochastic Optimization Algorithms for Finding Local Minima

  • Yaodong Yu
  • Pan Xu
  • Quanquan Gu

We propose stochastic optimization algorithms that can find local minima faster than existing algorithms for nonconvex optimization problems, by exploiting the third-order smoothness to escape non-degenerate saddle points more efficiently. More specifically, the proposed algorithm only needs $\tilde{O}(\epsilon^{-10/3})$ stochastic gradient evaluations to converge to an approximate local minimum $\mathbf{x}$, which satisfies $\|\nabla f(\mathbf{x})\|_2\leq\epsilon$ and $\lambda_{\min}(\nabla^2 f(\mathbf{x}))\geq -\sqrt{\epsilon}$ in unconstrained stochastic optimization, where $\tilde{O}(\cdot)$ hides logarithm polynomial terms and constants. This improves upon the $\tilde{O}(\epsilon^{-7/2})$ gradient complexity achieved by the state-of-the-art stochastic local minima finding algorithms by a factor of $\tilde{O}(\epsilon^{-1/6})$. Experiments on two nonconvex optimization problems demonstrate the effectiveness of our algorithm and corroborate our theory.

AAMAS Conference 2017 Conference Paper

Attenuate Locally, Win Globally: An Attenuation-based Framework for Online Stochastic Matching with Timeouts

  • Brian Brubach
  • Karthik Abinav Sankararaman
  • Aravind Srinivasan
  • Pan Xu

Online matching problems have garnered significant attention in recent years due to numerous applications. Many of them capture the uncertainty in the real world by including stochasticity in both the arrival process and the matching process. The Online Stochastic Matching with Timeouts problem introduced by Bansal, Gupta, Li, Mestre, Nagarajan, and Rudra (Algorithmica, 2012) models matching markets (e. g. E-Bay, Amazon). Buyers arrive from an independent and identically distributed (i. i. d.) known distribution on buyer profiles and can be shown a list of items one at a time. Each buyer has some probability of purchasing each item and a limit (timeout) on the number of items they can be shown. Bansal et al. (Algorithmica, 2012) gave a 0. 12-competitive algorithm which was improved by Adamczyk, Grandoni, and Mukherjee (ESA, 2015) to 0. 24. We present an online attenuation framework that uses an algorithm for offline stochastic matching as a black box. Our main contributions are as follows. On the upper bound side, we show that this framework combined with a black box adapted from Bansal et al. (Algorithmica, 2012) yields an online algorithm which nearly doubles the ratio to 0. 46. On the lower bound side, we show that no algorithm can achieve a ratio better than 0. 632 using the common LP for this problem. This framework has a high potential for further improvements since new algorithms for offline stochastic matching can lead directly to improvements for the online problem. Our online framework also has the potential for a variety of extensions. For example, we introduce a natural generalization: Online Stochastic Matching with Two-sided Timeouts in which both online and offline vertices have timeouts. Our framework provides the first algorithm for this problem achieving a ratio of 0. 31. We accomplish this by proposing a new black box algorithm for offline stochastic matching on star graphs, which may be of independent interest. This new black box improves the approximation ratio for ∗For detailed proofs, refer the full version of the paper. Aravind Srinivasan’s research was supported in part by NSF Awards CNS-1010789 and CCF-1422569, and by a research award from Adobe, Inc. The research of Brian Brubach, Karthik Sankararaman, and Pan Xu was supported in part by NSF Awards CNS-1010789 and CCF-1422569. Appears in: Proc. of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2017), S. Das, E. Durfee, K. Larson, M. Winikoff (eds.), May 8–12, 2017, São Paulo, Brazil. Copyright © 2017, International Foundation for Autonomous Agents and Multiagent Systems (www. ifaamas. org). All rights reserved. the offline stochastic matching problem on star graphs from 0. 5 by Adamczyk et al. (ESA 2015) to 0. 56.

NeurIPS Conference 2017 Conference Paper

Speeding Up Latent Variable Gaussian Graphical Model Estimation via Nonconvex Optimization

  • Pan Xu
  • Jian Ma
  • Quanquan Gu

We study the estimation of the latent variable Gaussian graphical model (LVGGM), where the precision matrix is the superposition of a sparse matrix and a low-rank matrix. In order to speed up the estimation of the sparse plus low-rank components, we propose a sparsity constrained maximum likelihood estimator based on matrix factorization and an efficient alternating gradient descent algorithm with hard thresholding to solve it. Our algorithm is orders of magnitude faster than the convex relaxation based methods for LVGGM. In addition, we prove that our algorithm is guaranteed to linearly converge to the unknown sparse and low-rank components up to the optimal statistical precision. Experiments on both synthetic and genomic data demonstrate the superiority of our algorithm over the state-of-the-art algorithms and corroborate our theory.

NeurIPS Conference 2016 Conference Paper

Semiparametric Differential Graph Models

  • Pan Xu
  • Quanquan Gu

In many cases of network analysis, it is more attractive to study how a network varies under different conditions than an individual static network. We propose a novel graphical model, namely Latent Differential Graph Model, where the networks under two different conditions are represented by two semiparametric elliptical distributions respectively, and the variation of these two networks (i. e. , differential graph) is characterized by the difference between their latent precision matrices. We propose an estimator for the differential graph based on quasi likelihood maximization with nonconvex regularization. We show that our estimator attains a faster statistical rate in parameter estimation than the state-of-the-art methods, and enjoys oracle property under mild conditions. Thorough experiments on both synthetic and real world data support our theory.