Arrow Research search

Author name cluster

Ming Yin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

29 papers
2 author rows

Possible papers

29

AAAI Conference 2026 Conference Paper

Align When They Want, Complement When They Need! Human-Centered Ensembles for Adaptive Human-AI Collaboration

  • Hasan Amin
  • Ming Yin
  • Rajiv Khanna

In human-AI decision making, designing AI that complements human expertise has been a natural strategy to enhance human-AI collaboration, yet it often comes at the cost of decreased AI performance in areas of human strengths. This can inadvertently erode human trust and cause them to ignore AI advice precisely when it is most needed. Conversely, an aligned AI fosters trust yet risks reinforcing suboptimal human behavior and lowering human-AI team performance. In this paper, we start by identifying this fundamental tension between performance-boosting (i.e., complementarity) and trust-building (i.e., alignment) as an inherent limitation of the traditional approach for training a single AI model to assist human decision making. To overcome this, we introduce a novel, human-centered adaptive AI ensemble that strategically toggles between two specialist AI models—the aligned model and the complementary model—based on contextual cues, using an elegantly simple yet provably near-optimal Rational Routing Shortcut mechanism. Comprehensive theoretical analyses elucidate why the adaptive AI ensemble is effective and when it yields maximum benefits. Moreover, experiments on both simulated and real-world data show that when humans are assisted by the adaptive AI ensemble in decision making, they can achieve significantly higher performance than when they are assisted by single AI models that are trained to either optimize for its independent performance or even the human-AI team performance.

AAAI Conference 2026 Conference Paper

Assessing Automated Fact-Checking for Medical LLM Responses with Knowledge Graphs

  • Shasha Zhou
  • Mingyu Huang
  • Jack Cole
  • Charles Britton
  • Ming Yin
  • Jan Wolber
  • Ke Li

The recent proliferation of large language models (LLMs) holds the potential to revolutionize healthcare, with strong capabilities in diverse medical tasks. Yet, deploying LLMs in high-stakes healthcare settings requires rigorous verification and validation to understand any potential harm. This paper investigates the reliability and viability of using medical knowledge graphs (KGs) for the automated factuality evaluation of LLM-generated responses. To ground this investigation, we introduce FAITH, a framework designed to systematically probe the strengths and limitations of this KG-based approach. FAITH operates without reference answers by decomposing responses into atomic claims, linking them to a medical KG, and scoring them based on evidence paths. Experiments on diverse medical tasks with human subjective evaluations demonstrate that KG-grounded evaluation achieves considerably higher correlations with clinician judgments and can effectively distinguish LLMs with varying capabilities. It is also robust to textual variances. The inherent explainability of its scoring can further help users understand and mitigate the limitations of current LLMs. We conclude that while limitations exist, leveraging KGs is a prominent direction for automated factuality assessment in healthcare.

AAAI Conference 2026 Conference Paper

NumCoKE: Ordinal-Aware Numerical Reasoning over Knowledge Graphs with Mixture-of-Experts and Contrastive Learning

  • Ming Yin
  • Zongsheng Cao
  • Qiqing Xia
  • Chenyang Tu
  • Neng Gao

Knowledge graphs (KGs) serve as a vital backbone for a wide range of AI applications, including natural language understanding and recommendation. A promising yet underexplored direction is numerical reasoning over KGs, which involves inferring new facts by leveraging not only symbolic triples but also numerical attribute values (e.g., length, weight). However, existing methods fall short in two key aspects: (1) Incomplete semantic integration: Most models struggle to jointly encode entities, relations, and numerical attributes in a unified representation space, limiting their ability to extract relation-aware semantics from numeric information. (2) Ordinal indistinguishability: Due to subtle differences between close values and sampling imbalance, models often fail to capture fine-grained ordinal relationships (e.g., longer, heavier), especially in the presence of hard negatives. To address these challenges, we propose NumCoKE—a numerical reasoning framework for KGs based on Mixture-of-Experts and Ordinal Contrastive Embedding. To overcome (C1), we introduce a Mixture-of-Experts Knowledge-Aware (MoEKA) encoder that jointly aligns symbolic and numeric components into a shared semantic space, while dynamically routing attribute features to relation-specific experts. To handle (C2), we propose Ordinal Knowledge Contrastive Learning (OKCL), which constructs ordinal-aware positive and negative samples using prior knowledge, enabling the model to better discriminate subtle semantic shifts. Extensive experiments on three public KG benchmarks demonstrate that NumCoKE consistently outperforms competitive baselines across diverse attribute distributions, validating its superiority in both semantic integration and ordinal reasoning.

AAAI Conference 2025 Conference Paper

K-hop Hypergraph Neural Network: A Comprehensive Aggregation Approach

  • Linhuang Xie
  • Shihao Gao
  • Jie Liu
  • Ming Yin
  • Taisong Jin

The powerful capability of HyperGraph Neural Networks (HGNNs) in modeling intricate, high-order relationships among multiple data samples stems primarily from their ability to aggregate both the direct neighborhood features of individual nodes and those associated with hyperedges. However, the limited scope of feature propagation in existing HGNNs significantly reduces the utilization of hypergraph information, exacerbating over-squashing and over-smoothing issues. To this end, we propose a novel K-hop HyperGraph Neural Network (KHGNN) to facilitate the interactions of distant nodes and hyperedges. Specifically, the bisection nested convolution based on HyperGINE is employed to extract features from nodes, hyperedges, and structures along all shortest paths between nodes or hyperedges, providing representations of long-distance relationships. With these comprehensive path features, nodes and hyperedges are guided to aggregate distant information while learning their complex relationships. The extensive experiments, particularly on long-range graph datasets, demonstrate that the proposed method achieves SOTA performance compared to existing HGNNs and graph neural networks.

NeurIPS Conference 2024 Conference Paper

A Theoretical Perspective for Speculative Decoding Algorithm

  • Ming Yin
  • Minshuo Chen
  • Kaixuan Huang
  • Mengdi Wang

Transformer-based autoregressive sampling has been the major bottleneck for slowing down large language model inferences. One effective way to accelerate inference is Speculative Decoding, which employs a small model to sample a sequence of draft tokens and a large model to validate. Given its empirical effectiveness, the theoretical understanding of Speculative Decoding is falling behind. This paper tackles this gap by conceptualizing the decoding problem via markov chain abstraction and studying the key properties, output quality and inference acceleration, from a theoretical perspective. Our analysis covers the theoretical limits of speculative decoding, batch algorithms, and output quality-inference acceleration tradeoffs. Our results reveal the fundamental connections between different components of LLMs via total variation distances and show how they jointly affect the efficiency of decoding algorithms.

JBHI Journal 2024 Journal Article

An Accurate Non-Contact Photoplethysmography via Active Cancellation of Reflective Interference

  • Yonggang Tong
  • Zhipei Huang
  • Feng Qiu
  • Tao Wang
  • Yiquan Wang
  • Fei Qin
  • Ming Yin

Imaging Photoplethysmography (IPPG) is an emerging and efficient optical method for non-contact measurement of pulse waves using an image sensor. While the contactless way brings convenience, the inevitable distance between the sensor and the subject results in massive specular reflection interference on the skin surface, which leads to a low Signal to Interference plus Noise Ratio (SINR) of IPPG. To ease this challenge, this work proposes a novel modulation illumination approach to measure the accurate arterial pulse wave via surface reflection interference isolation from IPPG. Based on the proposed skin reflection model, a specific modulation illumination is designed to separate the surface reflections and obtain the subcutaneous diffuse reflections containing the pulse wave information. Compared with the results under ambient illumination and constant supplemental illumination, the SINR of the proposed method is improved by 4. 56 and 3. 74 dB, respectively.

AAAI Conference 2024 Conference Paper

Decoding AI’s Nudge: A Unified Framework to Predict Human Behavior in AI-Assisted Decision Making

  • Zhuoyan Li
  • Zhuoran Lu
  • Ming Yin

With the rapid development of AI-based decision aids, different forms of AI assistance have been increasingly integrated into the human decision making processes. To best support humans in decision making, it is essential to quantitatively understand how diverse forms of AI assistance influence humans' decision making behavior. To this end, much of the current research focuses on the end-to-end prediction of human behavior using ``black-box'' models, often lacking interpretations of the nuanced ways in which AI assistance impacts the human decision making process. Meanwhile, methods that prioritize the interpretability of human behavior predictions are often tailored for one specific form of AI assistance, making adaptations to other forms of assistance difficult. In this paper, we propose a computational framework that can provide an interpretable characterization of the influence of different forms of AI assistance on decision makers in AI-assisted decision making. By conceptualizing AI assistance as the ``nudge'' in human decision making processes, our approach centers around modelling how different forms of AI assistance modify humans' strategy in weighing different information in making their decisions. Evaluations on behavior data collected from real human decision makers show that the proposed framework outperforms various baselines in accurately predicting human behavior in AI-assisted decision making. Based on the proposed framework, we further provide insights into how individuals with different cognitive styles are nudged by AI assistance differently.

IJCAI Conference 2024 Conference Paper

Designing Behavior-Aware AI to Improve the Human-AI Team Performance in AI-Assisted Decision Making

  • Syed Hasan Amin Mahmood
  • Zhuoran Lu
  • Ming Yin

With the rapid development of decision aids that are driven by AI models, the practice of AI-assisted decision making has become increasingly prevalent. To improve the human-AI team performance in decision making, earlier studies mostly focus on enhancing humans' capability in better utilizing a given AI-driven decision aid. In this paper, we tackle this challenge through a complementary approach—we aim to train "behavior-aware AI" by adjusting the AI model underlying the decision aid to account for humans' behavior in adopting AI advice. In particular, as humans are observed to accept AI advice more when their confidence in their own judgement is low, we propose to train AI models with a human-confidence-based instance weighting strategy, instead of solving the standard empirical risk minimization problem. Under an assumed, threshold-based model characterizing when humans will adopt the AI advice, we first derive the optimal instance weighting strategy for training AI models. We then validate the efficacy and robustness of our proposed method in improving the human-AI joint decision making performance through systematic experimentation on synthetic datasets. Finally, via randomized experiments with real human subjects along with their actual behavior in adopting the AI advice, we demonstrate that our method can significantly improve the decision making performance of the human-AI team in practice.

NeurIPS Conference 2024 Conference Paper

Fast Best-of-N Decoding via Speculative Rejection

  • Hanshi Sun
  • Momin Haider
  • Ruiqi Zhang
  • Huitao Yang
  • Jiahao Qiu
  • Ming Yin
  • Mengdi Wang
  • Peter Bartlett

The safe and effective deployment of Large Language Models (LLMs) involves a critical step called alignment, which ensures that the model's responses are in accordance with human preferences. Prevalent alignment techniques, such as DPO, PPO and their variants, align LLMs by changing the pre-trained model weights during a phase called post-training. While predominant, these post-training methods add substantial complexity before LLMs can be deployed. Inference-time alignment methods avoid the complex post-training step and instead bias the generation towards responses that are aligned with human preferences. The best-known inference-time alignment method, called Best-of-N, is as effective as the state-of-the-art post-training procedures. Unfortunately, Best-of-N requires vastly more resources at inference time than standard decoding strategies, which makes it computationally not viable. In this work, we introduce Speculative Rejection, a computationally-viable inference-time alignment algorithm. It generates high-scoring responses according to a given reward model, like Best-of-N does, while being between 16 to 32 times more computationally efficient.

ICML Conference 2024 Conference Paper

Learning the Target Network in Function Space

  • Kavosh Asadi
  • Yao Liu 0009
  • Shoham Sabach
  • Ming Yin
  • Rasool Fakoor

We focus on the task of learning the value function in the reinforcement learning (RL) setting. This task is often solved by updating a pair of online and target networks while ensuring that the parameters of these two networks are equivalent. We propose Lookahead-Replicate (LR), a new value-function approximation algorithm that is agnostic to this parameter-space equivalence. Instead, the LR algorithm is designed to maintain an equivalence between the two networks in the function space. This value-based equivalence is obtained by employing a new target-network update. We show that LR leads to a convergent behavior in learning the value function. We also present empirical results demonstrating that LR-based target-network updates significantly improve deep RL on the Atari benchmark.

NeurIPS Conference 2024 Conference Paper

NetworkGym: Reinforcement Learning Environments for Multi-Access Traffic Management in Network Simulation

  • Momin Haider
  • Ming Yin
  • Menglei Zhang
  • Arpit Gupta
  • Jing Zhu
  • Yu-Xiang Wang

Mobile devices such as smartphones, laptops, and tablets can often connect to multiple access networks (e. g. , Wi-Fi, LTE, and 5G) simultaneously. Recent advancements facilitate seamless integration of these connections below the transport layer, enhancing the experience for apps that lack inherent multi-path support. This optimization hinges on dynamically determining the traffic distribution across networks for each device, a process referred to as multi-access traffic splitting. This paper introduces NetworkGym, a high-fidelity network environment simulator that facilitates generating multiple network traffic flows and multi-access traffic splitting. This simulator facilitates training and evaluating different RL-based solutions for the multi-access traffic splitting problem. Our initial explorations demonstrate that the majority of existing state-of-the-art offline RL algorithms (e. g. CQL) fail to outperform certain hand-crafted heuristic policies on average. This illustrates the urgent need to evaluate offline RL algorithms against a broader range of benchmarks, rather than relying solely on popular ones such as D4RL. We also propose an extension to the TD3+BC algorithm, named Pessimistic TD3 (PTD3), and demonstrate that it outperforms many state-of-the-art offline RL algorithms. PTD3's behavioral constraint mechanism, which relies on value-function pessimism, is theoretically motivated and relatively simple to implement. We open source our code and offline datasets at github. com/hmomin/networkgym.

NeurIPS Conference 2024 Conference Paper

Offline Multitask Representation Learning for Reinforcement Learning

  • Haque Ishfaq
  • Thanh Nguyen-Tang
  • Songtao Feng
  • Raman Arora
  • Mengdi Wang
  • Ming Yin
  • Doina Precup

We study offline multitask representation learning in reinforcement learning (RL), where a learner is provided with an offline dataset from different tasks that share a common representation and is asked to learn the shared representation. We theoretically investigate offline multitask low-rank RL, and propose a new algorithm called MORL for offline multitask representation learning. Furthermore, we examine downstream RL in reward-free, offline and online scenarios, where a new task is introduced to the agent that shares the same representation as the upstream offline tasks. Our theoretical results demonstrate the benefits of using the learned representation from the upstream offline task instead of directly learning the representation of the low-rank model.

NeurIPS Conference 2024 Conference Paper

Transfer Q-star : Principled Decoding for LLM Alignment

  • Souradip Chakraborty
  • Soumya Suvra Ghosal
  • Ming Yin
  • Dinesh Manocha
  • Mengdi Wang
  • Amrit Singh Bedi
  • Furong Huang

Aligning foundation models is essential for their safe and trustworthy deployment. However, traditional fine-tuning methods are computationally intensive and require updating billions of model parameters. A promising alternative, alignment via decoding, adjusts the response distribution directly without model updates to maximize a target reward $r$, thus providing a lightweight and adaptable framework for alignment. However, principled decoding methods rely on oracle access to an optimal Q-function ($Q^*$), which is often unavailable in practice. Hence, prior SoTA methods either approximate this $Q^*$ using $Q^{\pi_{\text{sft}}}$ (derived from the reference $\texttt{SFT}$ model) or rely on short-term rewards, resulting in sub-optimal decoding performance. In this work, we propose $\texttt{Transfer Q}^*$, which implicitly estimates the optimal value function for a target reward $r$ through a baseline model $\rho_{\texttt{BL}}$ aligned with a baseline reward $r_{\texttt{BL}}$ (which can be different from the target reward $r$). Theoretical analyses of $\texttt{Transfer Q}^*$ provide a rigorous characterization of its optimality, deriving an upper bound on the sub-optimality gap and identifying a hyperparameter to control the deviation from the pre-trained reference $\texttt{SFT}$ model based on user needs. Our approach significantly reduces the sub-optimality gap observed in prior SoTA methods and demonstrates superior empirical performance across key metrics such as coherence, diversity, and quality in extensive tests on several synthetic and real datasets.

NeurIPS Conference 2024 Conference Paper

Utilizing Human Behavior Modeling to Manipulate Explanations in AI-Assisted Decision Making: The Good, the Bad, and the Scary

  • Zhuoyan Li
  • Ming Yin

Recent advances in AI models have increased the integration of AI-based decision aids into the human decision making process. To fully unlock the potential of AI-assisted decision making, researchers have computationally modeled how humans incorporate AI recommendations into their final decisions, and utilized these models to improve human-AI team performance. Meanwhile, due to the ``black-box'' nature of AI models, providing AI explanations to human decision makers to help them rely on AI recommendations more appropriately has become a common practice. In this paper, we explore whether we can quantitatively model how humans integrate both AI recommendations and explanations into their decision process, and whether this quantitative understanding of human behavior from the learned model can be utilized to manipulate AI explanations, thereby nudging individuals towards making targeted decisions. Our extensive human experiments across various tasks demonstrate that human behavior can be easily influenced by these manipulated explanations towards targeted outcomes, regardless of the intent being adversarial or benign. Furthermore, individuals often fail to detect any anomalies in these explanations, despite their decisions being affected by them.

AAAI Conference 2023 Conference Paper

Modeling Human Trust and Reliance in AI-Assisted Decision Making: A Markovian Approach

  • Zhuoyan Li
  • Zhuoran Lu
  • Ming Yin

The increased integration of artificial intelligence (AI) technologies in human workflows has resulted in a new paradigm of AI-assisted decision making, in which an AI model provides decision recommendations while humans make the final decisions. To best support humans in decision making, it is critical to obtain a quantitative understanding of how humans interact with and rely on AI. Previous studies often model humans' reliance on AI as an analytical process, i.e., reliance decisions are made based on cost-benefit analysis. However, theoretical models in psychology suggest that the reliance decisions can often be driven by emotions like humans' trust in AI models. In this paper, we propose a hidden Markov model to capture the affective process underlying the human-AI interaction in AI-assisted decision making, by characterizing how decision makers adjust their trust in AI over time and make reliance decisions based on their trust. Evaluations on real human behavior data collected from human-subject experiments show that the proposed model outperforms various baselines in accurately predicting humans' reliance behavior in AI-assisted decision making. Based on the proposed model, we further provide insights into how humans' trust and reliance dynamics in AI-assisted decision making is influenced by contextual factors like decision stakes and their interaction experiences.

AAAI Conference 2023 Conference Paper

On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation

  • Thanh Nguyen-Tang
  • Ming Yin
  • Sunil Gupta
  • Svetha Venkatesh
  • Raman Arora

Sample-efficient offline reinforcement learning (RL) with linear function approximation has been studied extensively recently. Much of the prior work has yielded instance-independent rates that hold even for the worst-case realization of problem instances. This work seeks to understand instance-dependent bounds for offline RL with linear function approximation. We present an algorithm called Bootstrapped and Constrained Pessimistic Value Iteration (BCP-VI), which leverages data bootstrapping and constrained optimization on top of pessimism. We show that under a partial data coverage assumption, that of concentrability with respect to an optimal policy, the proposed algorithm yields a fast rate for offline RL when there is a positive gap in the optimal Q-value functions, even if the offline data were collected adaptively. Moreover, when the linear features of the optimal actions in the states reachable by an optimal policy span those reachable by the behavior policy and the optimal actions are unique, offline RL achieves absolute zero sub-optimality error when the number of episodes exceeds a (finite) instance-dependent threshold. To the best of our knowledge, these are the first results that give a fast rate bound on the sub-optimality and an absolute zero sub-optimality bound for offline RL with linear function approximation from adaptive data with partial coverage. We also provide instance-agnostic and instance-dependent information-theoretical lower bounds to complement our upper bounds.

NeurIPS Conference 2023 Conference Paper

Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation

  • Nikki Lijing Kuang
  • Ming Yin
  • Mengdi Wang
  • Yu-Xiang Wang
  • Yian Ma

Recent studies in reinforcement learning (RL) have made significant progress by leveraging function approximation to alleviate the sample complexity hurdle for better performance. Despite the success, existing provably efficient algorithms typically rely on the accessibility of immediate feedback upon taking actions. The failure to account for the impact of delay in observations can significantly degrade the performance of real-world systems due to the regret blow-up. In this work, we tackle the challenge of delayed feedback in RL with linear function approximation by employing posterior sampling, which has been shown to empirically outperform the popular UCB algorithms in a wide range of regimes. We first introduce \textit{Delayed-PSVI}, an optimistic value-based algorithm that effectively explores the value function space via noise perturbation with posterior sampling. We provide the first analysis for posterior sampling algorithms with delayed feedback in RL and show our algorithm achieves $\widetilde{O}(\sqrt{d^3H^3 T} + d^2H^2 \mathbb{E}[\tau])$ worst-case regret in the presence of unknown stochastic delays. Here $\mathbb{E}[\tau]$ is the expected delay. To further improve its computational efficiency and to expand its applicability in high-dimensional RL problems, we incorporate a gradient-based approximate sampling scheme via Langevin dynamics for \textit{Delayed-LPSVI}, which maintains the same order-optimal regret guarantee with $\widetilde{O}(dHK)$ computational cost. Empirical evaluations are performed to demonstrate the statistical and computational efficacy of our algorithms.

IJCAI Conference 2023 Conference Paper

Strategic Adversarial Attacks in AI-assisted Decision Making to Reduce Human Trust and Reliance

  • Zhuoran Lu
  • Zhuoyan Li
  • Chun-Wei Chiang
  • Ming Yin

With the increased integration of AI technologies in human decision making processes, adversarial attacks on AI models become a greater concern than ever before as they may significantly hurt humans’ trust in AI models and decrease the effectiveness of human-AI collaboration. While many adversarial attack methods have been proposed to decrease the performance of an AI model, limited attention has been paid on understanding how these attacks will impact the human decision makers interacting with the model, and accordingly, how to strategically deploy adversarial attacks to maximize the reduction of human trust and reliance. In this paper, through a human-subject experiment, we first show that in AI-assisted decision making, the timing of the attacks largely influences how much humans decrease their trust in and reliance on AI—the decrease is particularly salient when attacks occur on decision making tasks that humans are highly confident themselves. Based on these insights, we next propose an algorithmic framework to infer the human decision maker’s hidden trust in the AI model and dynamically decide when the attacker should launch an attack to the model. Our evaluations show that following the proposed approach, attackers deploy more efficient attacks and achieve higher utility than adopting other baseline strategies.

IJCAI Conference 2023 Conference Paper

The Effects of AI Biases and Explanations on Human Decision Fairness: A Case Study of Bidding in Rental Housing Markets

  • Xinru Wang
  • Chen Liang
  • Ming Yin

The use of AI-based decision aids in diverse domains has inspired many empirical investigations into how AI models’ decision recommendations impact humans’ decision accuracy in AI-assisted decision making, while explorations on the impacts on humans’ decision fairness are largely lacking despite their clear importance. In this paper, using a real-world business decision making scenario—bidding in rental housing markets—as our testbed, we present an experimental study on understanding how the bias level of the AI-based decision aid as well as the provision of AI explanations affect the fairness level of humans’ decisions, both during and after their usage of the decision aid. Our results suggest that when people are assisted by an AI-based decision aid, both the higher level of racial biases the decision aid exhibits and surprisingly, the presence of AI explanations, result in more unfair human decisions across racial groups. Moreover, these impacts are partly made through triggering humans’ “disparate interactions” with AI. However, regardless of the AI bias level and the presence of AI explanations, when people return to make independent decisions after their usage of the AI-based decision aid, their decisions no longer exhibit significant unfairness across racial groups.

IJCAI Conference 2021 Conference Paper

Accounting for Confirmation Bias in Crowdsourced Label Aggregation

  • Meric Altug Gemalmaz
  • Ming Yin

Collecting large-scale human-annotated datasets via crowdsourcing to train and improve automated models is a prominent human-in-the-loop approach to integrate human and machine intelligence. However, together with their unique intelligence, humans also come with their biases and subjective beliefs, which may influence the quality of the annotated data and negatively impact the effectiveness of the human-in-the-loop systems. One of the most common types of cognitive biases that humans are subject to is the confirmation bias, which is people's tendency to favor information that confirms their existing beliefs and values. In this paper, we present an algorithmic approach to infer the correct answers of tasks by aggregating the annotations from multiple crowd workers, while taking workers' various levels of confirmation bias into consideration. Evaluations on real-world crowd annotations show that the proposed bias-aware label aggregation algorithm outperforms baseline methods in accurately inferring the ground-truth labels of different tasks when crowd workers indeed exhibit some degree of confirmation bias. Through simulations on synthetic data, we further identify the conditions when the proposed algorithm has the largest advantages over baseline methods.

IJCAI Conference 2021 Conference Paper

Exploring the Effects of Goal Setting When Training for Complex Crowdsourcing Tasks (Extended Abstract)

  • Amy Rechkemmer
  • Ming Yin

Training is one way of enabling novice workers to work on complex crowdsourcing tasks. Based on goal setting theory in psychology, we conduct a randomized experiment to study whether and how setting different goals---including performance goal, learning goal, and behavioral goal---when training workers for a complex crowdsourcing task affects workers' learning perception, learning gain, and post-training performance. We find that setting different goals during training significantly affects workers' learning perception, but does not have an effect on learning gain or post-training performance. Further, exploratory analysis helps shed light on when and why various goals may or may not work in the crowdsourcing context.

NeurIPS Conference 2021 Conference Paper

Near-Optimal Offline Reinforcement Learning via Double Variance Reduction

  • Ming Yin
  • Yu Bai
  • Yu-Xiang Wang

We consider the problem of offline reinforcement learning (RL) --- a well-motivated setting of RL that aims at policy optimization using only historical data. Despite its wide applicability, theoretical understandings of offline RL, such as its optimal sample complexity, remain largely open even in basic settings such as \emph{tabular} Markov Decision Processes (MDPs). In this paper, we propose \emph{Off-Policy Double Variance Reduction} (OPDVR), a new variance reduction-based algorithm for offline RL. Our main result shows that OPDVR provably identifies an $\epsilon$-optimal policy with $\widetilde{O}(H^2/d_m\epsilon^2)$ episodes of offline data in the finite-horizon \emph{stationary transition} setting, where $H$ is the horizon length and $d_m$ is the minimal marginal state-action distribution induced by the behavior policy. This improves over the best-known upper bound by a factor of $H$. Moreover, we establish an information-theoretic lower bound of $\Omega(H^2/d_m\epsilon^2)$ which certifies that OPDVR is optimal up to logarithmic factors. Lastly, we show that OPDVR also achieves rate-optimal sample complexity under alternative settings such as the finite-horizon MDPs with non-stationary transitions and the infinite horizon MDPs with discounted rewards.

NeurIPS Conference 2021 Conference Paper

Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings

  • Ming Yin
  • Yu-Xiang Wang

This work studies the statistical limits of uniform convergence for offline policy evaluation (OPE) problems with model-based methods (for episodic MDP) and provides a unified framework towards optimal learning for several well-motivated offline tasks. Uniform OPE $\sup_\Pi|Q^\pi-\hat{Q}^\pi|<\epsilon$ is a stronger measure than the point-wise OPE and ensures offline learning when $\Pi$ contains all policies (the global class). In this paper, we establish an $\Omega(H^2 S/d_m\epsilon^2)$ lower bound (over model-based family) for the global uniform OPE and our main result establishes an upper bound of $\tilde{O}(H^2/d_m\epsilon^2)$ for the \emph{local} uniform convergence that applies to all \emph{near-empirically optimal} policies for the MDPs with \emph{stationary} transition. Here $d_m$ is the minimal marginal state-action probability. Critically, the highlight in achieving the optimal rate $\tilde{O}(H^2/d_m\epsilon^2)$ is our design of \emph{singleton absorbing MDP}, which is a new sharp analysis tool that works with the model-based approach. We generalize such a model-based framework to the new settings: offline task-agnostic and the offline reward-free with optimal complexity $\tilde{O}(H^2\log(K)/d_m\epsilon^2)$ ($K$ is the number of tasks) and $\tilde{O}(H^2S/d_m\epsilon^2)$ respectively. These results provide a unified solution for simultaneously solving different offline RL problems.

NeurIPS Conference 2021 Conference Paper

Towards Instance-Optimal Offline Reinforcement Learning with Pessimism

  • Ming Yin
  • Yu-Xiang Wang

We study the \emph{offline reinforcement learning} (offline RL) problem, where the goal is to learn a reward-maximizing policy in an unknown \emph{Markov Decision Process} (MDP) using the data coming from a policy $\mu$. In particular, we consider the sample complexity problems of offline RL for the finite horizon MDPs. Prior works derive the information-theoretical lower bounds based on different data-coverage assumptions and their upper bounds are expressed by the covering coefficients which lack the explicit characterization of system quantities. In this work, we analyze the \emph{Adaptive Pessimistic Value Iteration} (APVI) algorithm and derive the suboptimality upper bound that nearly matches\[O\left(\sum_{h=1}^H\sum_{s_h, a_h}d^{\pi^\star}_h(s_h, a_h)\sqrt{\frac{\mathrm{Var}_{P_{s_h, a_h}}{(V^\star_{h+1}+r_h)}}{d^\mu_h(s_h, a_h)}}\sqrt{\frac{1}{n}}\right). \]We also prove an information-theoretical lower bound to show this quantity is required under the weak assumption that $d^\mu_h(s_h, a_h)>0$ if $d^{\pi^\star}_h(s_h, a_h)>0$. Here $\pi^\star$ is a optimal policy, $\mu$ is the behavior policy and $d(s_h, a_h)$ is the marginal state-action probability. We call this adaptive bound the \emph{intrinsic offline reinforcement learning bound} since it directly implies all the existing optimal results: minimax rate under uniform data-coverage assumption, horizon-free setting, single policy concentrability, and the tight problem-dependent results. Later, we extend the result to the \emph{assumption-free} regime (where we make no assumption on $\mu$) and obtain the assumption-free intrinsic bound. Due to its generic form, we believe the intrinsic bound could help illuminate what makes a specific problem hard and reveal the fundamental challenges in offline RL.

AAAI Conference 2020 Conference Paper

Shared Generative Latent Representation Learning for Multi-View Clustering

  • Ming Yin
  • Weitian Huang
  • Junbin Gao

Clustering multi-view data has been a fundamental research topic in the computer vision community. It has been shown that a better accuracy can be achieved by integrating information of all the views than just using one view individually. However, the existing methods often struggle with the issues of dealing with the large-scale datasets and the poor performance in reconstructing samples. This paper proposes a novel multi-view clustering method by learning a shared generative latent representation that obeys a mixture of Gaussian distributions. The motivation is based on the fact that the multi-view data share a common latent embedding despite the diversity among the various views. Specifically, benefitting from the success of the deep generative learning, the proposed model can not only extract the nonlinear features from the views, but render a powerful ability in capturing the correlations among all the views. The extensive experimental results on several datasets with different scales demonstrate that the proposed method outperforms the state-of-the-art methods under a range of performance criteria.

IJCAI Conference 2015 Conference Paper

Bonus or Not? Learn to Reward in Crowdsourcing

  • Ming Yin
  • Yiling Chen

Recent work has shown that the quality of work produced in a crowdsourcing working session can be influenced by the presence of performancecontingent financial incentives, such as bonuses for exceptional performance, in the session. We take an algorithmic approach to decide when to offer bonuses in a working session to improve the overall utility that a requester derives from the session. Specifically, we propose and train an inputoutput hidden Markov model to learn the impact of bonuses on work quality and then use this model to dynamically decide whether to offer a bonus on each task in a working session to maximize a requester’s utility. Experiments on Amazon Mechanical Turk show that our approach leads to higher utility for the requester than fixed and random bonus schemes do. Simulations on synthesized data sets further demonstrate the robustness of our approach against different worker population and worker behavior in improving requester utility.

AAAI Conference 2013 Conference Paper

The Effects of Performance-Contingent Financial Incentives in Online Labor Markets

  • Ming Yin
  • Yiling Chen
  • Yu-An Sun

Online labor markets such as Amazon Mechanical Turk (MTurk) have emerged as platforms that facilitate the allocation of productive effort across global economies. Many of these markets compensate workers with monetary payments. We study the effects of performance-contingent financial rewards on work quality and worker effort in MTurk via two experiments. We find that the magnitude of performancecontingent financial rewards alone affects neither quality nor effort. However, when workers working on two tasks of the same type in a sequence, the change in the magnitude of the reward over the two tasks affects both. In particular, both work quality and worker effort increase (alternatively decrease) as the reward increases (alternatively decreases) for the second task. This suggests the existence of the anchoring effect on workers’ perception of incentives in MTurk and that this effect can be leveraged in workflow design to increase the effectiveness of financial incentives.