Arrow Research search

Author name cluster

Tyler Lu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers
2 author rows

Possible papers

18

ICML Conference 2025 Conference Paper

Representative Ranking for Deliberation in the Public Sphere

  • Manon Revel
  • Smitha Milli
  • Tyler Lu
  • Jamelle Watson-Daniels
  • Maximilian Nickel

Online comment sections, such as those on news sites or social media, have the potential to foster informal public deliberation, However, this potential is often undermined by the frequency of toxic or low-quality exchanges that occur in these settings. To combat this, platforms increasingly leverage algorithmic ranking to facilitate higher-quality discussions, e. g. , by using civility classifiers or forms of prosocial ranking. Yet, these interventions may also inadvertently reduce the visibility of legitimate viewpoints, undermining another key aspect of deliberation: representation of diverse views. We seek to remedy this problem by introducing guarantees of representation into these methods. In particular, we adopt the notion of justified representation (JR) from the social choice literature and incorporate a JR constraint into the comment ranking setting. We find that enforcing JR leads to greater inclusion of diverse viewpoints while still being compatible with optimizing for user engagement or other measures of conversational quality.

ICML Conference 2020 Conference Paper

ConQUR: Mitigating Delusional Bias in Deep Q-Learning

  • DiJia Andy Su
  • Jayden Ooi
  • Tyler Lu
  • Dale Schuurmans
  • Craig Boutilier

Delusional bias is a fundamental source of error in approximate Q-learning. To date, the only techniques that explicitly address delusion require comprehensive search using tabular value estimates. In this paper, we develop efficient methods to mitigate delusional bias by training Q-approximators with labels that are "consistent" with the underlying greedy policy class. We introduce a simple penalization scheme that encourages Q-labels used across training batches to remain (jointly) consistent with the expressible policy class. We also propose a search framework that allows multiple Q-approximators to be generated and tracked, thus mitigating the effect of premature (implicit) policy commitments. Experimental results demonstrate that these methods can improve the performance of Q-learning in a variety of Atari games, sometimes dramatically.

AAAI Conference 2020 Conference Paper

Gradient-Based Optimization for Bayesian Preference Elicitation

  • Ivan Vendrov
  • Tyler Lu
  • Qingqing Huang
  • Craig Boutilier

Effective techniques for eliciting user preferences have taken on added importance as recommender systems (RSs) become increasingly interactive and conversational. A common and conceptually appealing Bayesian criterion for selecting queries is expected value of information (EVOI). Unfortunately, it is computationally prohibitive to construct queries with maximum EVOI in RSs with large item spaces. We tackle this issue by introducing a continuous formulation of EVOI as a differentiable network that can be optimized using gradient methods available in modern machine learning computational frameworks (e. g. , TensorFlow, PyTorch). We exploit this to develop a novel Monte Carlo method for EVOI optimization, which is much more scalable for large item spaces than methods requiring explicit enumeration of items. While we emphasize the use of this approach for pairwise (or k-wise) comparisons of items, we also demonstrate how our method can be adapted to queries involving subsets of item attributes or “partial items, ” which are often more cognitively manageable for users. Experiments show that our gradientbased EVOI technique achieves state-of-the-art performance across several domains while scaling to large item spaces.

AIJ Journal 2020 Journal Article

Preference elicitation and robust winner determination for single- and multi-winner social choice

  • Tyler Lu
  • Craig Boutilier

The use of voting schemes based on rankings of alternatives to solve social choice problems can often impose significant burden on voters, both in terms of communication and cognitive requirements. In this paper, we develop techniques for preference elicitation in voting settings (i. e. , vote elicitation) that can alleviate this burden by minimizing the amount of preference information needed to find (approximately or exactly) optimal outcomes. We first describe robust optimization techniques for determining winning alternatives given partial preference information (i. e. , partial rankings) using the notion of minimax regret. We show that the corresponding computational problem is tractable for some important voting rules, and intractable for others. We then use the solution to the minimax-regret optimization as the basis for vote elicitation schemes that determine appropriate preference queries for voters to quickly reduce potential regret. We apply these techniques to multi-winner social choice problems as well, in which a slate of alternatives must be selected, developing both exact and greedy robust optimization procedures. Empirical results on several data sets validate the effectiveness of our techniques.

NeurIPS Conference 2018 Conference Paper

Data center cooling using model-predictive control

  • Nevena Lazic
  • Craig Boutilier
  • Tyler Lu
  • Eehern Wong
  • Binz Roy
  • MK Ryu
  • Greg Imwalle

Despite impressive recent advances in reinforcement learning (RL), its deployment in real-world physical systems is often complicated by unexpected events, limited data, and the potential for expensive failures. In this paper, we describe an application of RL “in the wild” to the task of regulating temperatures and airflow inside a large-scale data center (DC). Adopting a data-driven, model-based approach, we demonstrate that an RL agent with little prior knowledge is able to effectively and safely regulate conditions on a server floor after just a few hours of exploration, while improving operational efficiency relative to existing PID controllers.

NeurIPS Conference 2018 Conference Paper

Non-delusional Q-learning and value-iteration

  • Tyler Lu
  • Dale Schuurmans
  • Craig Boutilier

We identify a fundamental source of error in Q-learning and other forms of dynamic programming with function approximation. Delusional bias arises when the approximation architecture limits the class of expressible greedy policies. Since standard Q-updates make globally uncoordinated action choices with respect to the expressible policy class, inconsistent or even conflicting Q-value estimates can result, leading to pathological behaviour such as over/under-estimation, instability and even divergence. To solve this problem, we introduce a new notion of policy consistency and define a local backup process that ensures global consistency through the use of information sets---sets that record constraints on policies consistent with backed-up Q-values. We prove that both the model-based and model-free algorithms using this backup remove delusional bias, yielding the first known algorithms that guarantee optimal results under general conditions. These algorithms furthermore only require polynomially many information sets (from a potentially exponential support). Finally, we suggest other practical heuristics for value-iteration and Q-learning that attempt to reduce delusional bias.

IJCAI Conference 2017 Conference Paper

Logistic Markov Decision Processes

  • Martin Mladenov
  • Craig Boutilier
  • Dale Schuurmans
  • Ofer Meshi
  • Gal Elidan
  • Tyler Lu

User modeling in advertising and recommendation has typically focused on myopic predictors of user responses. In this work, we consider the long-term decision problem associated with user interaction. We propose a concise specification of long-term interaction dynamics by combining factored dynamic Bayesian networks with logistic predictors of user responses, allowing state-of-the-art prediction models to be seamlessly extended. We show how to solve such models at scale by providing a constraint generation approach for approximate linear programming that overcomes the variable coupling and non-linearity induced by the logistic regression predictor. The efficacy of the approach is demonstrated on advertising domains with up to 2^54 states and 2^39 actions.

UAI Conference 2016 Conference Paper

Budget Allocation using Weakly Coupled, Constrained Markov Decision Processes

  • Craig Boutilier
  • Tyler Lu

We consider the problem of budget (or other resource) allocation in sequential decision problems involving a large number of concurrently running sub-processes, whose only interaction is through their consumption of budget. Our first contribution is the introduction of budgeted MDPs (BMDPs), an MDP model in which policies/values are a function of available budget, (c. f. constrained MDPs which are solved given a fixed budget). BMDPs allow one to explicitly trade off allocated budget and expected value. We show that optimal value functions are concave, non-decreasing in budget, and piecewise-linear in the finite horizon case, and can be computed by dynamic programming (and support ready approximation). Our second contribution is a method that exploits BMDP solutions to allocate budget to a large number of independent BMDPs, coupled only by their common budget pool. The problem can be cast as a multiple-choice knapsack problem, which admits an efficient, optimal greedy algorithm. Empirical results in an online advertising domain confirm the efficacy of our methods.

AIJ Journal 2015 Journal Article

Optimal social choice functions: A utilitarian view

  • Craig Boutilier
  • Ioannis Caragiannis
  • Simi Haber
  • Tyler Lu
  • Ariel D. Procaccia
  • Or Sheffet

We adopt a utilitarian perspective on social choice, assuming that agents have (possibly latent) utility functions over some space of alternatives. For many reasons one might consider mechanisms, or social choice functions, that only have access to the ordinal rankings of alternatives by the individual agents rather than their utility functions. In this context, one possible objective for a social choice function is the maximization of (expected) social welfare relative to the information contained in these rankings. We study such optimal social choice functions under three different models, and underscore the important role played by scoring functions. In our worst-case model, no assumptions are made about the underlying distribution and we analyze the worst-case distortion—or degree to which the selected alternative does not maximize social welfare—of optimal (randomized) social choice functions. In our average-case model, we derive optimal functions under neutral (or impartial culture) probabilistic models. Finally, a very general learning-theoretic model allows for the computation of optimal social choice functions (i. e. , ones that maximize expected social welfare) under arbitrary, sampleable distributions. In the latter case, we provide both algorithms and sample complexity results for the class of scoring functions, and further validate the approach empirically.

AAAI Conference 2015 Conference Paper

Value-Directed Compression of Large-Scale Assignment Problems

  • Tyler Lu
  • Craig Boutilier

Data-driven analytics—in areas ranging from consumer marketing to public policy—often allow behavior prediction at the level of individuals rather than population segments, offering the opportunity to improve decisions that impact large populations. Modeling such (generalized) assignment problems as linear programs, we propose a general value-directed compression technique for solving such problems at scale. We dynamically segment the population into cells using a form of column generation, constructing groups of individuals who can provably be treated identically in the optimal solution. This compression allows problems, unsolvable using standard LP techniques, to be solved effectively. Indeed, once a compressed LP is constructed, problems can solved in milliseconds. We provide a theoretical analysis of the methods, outline the distributed implementation of the requisite data processing, and show how a single compressed LP can be used to solve multiple variants of the original LP nearoptimally in real-time (e. g. , to support scenario analysis). We also show how the method can be leveraged in integer programming models. Experimental results on marketing contact optimization and political legislature problems validate the performance of our technique.

JMLR Journal 2014 Journal Article

Effective Sampling and Learning for Mallows Models with Pairwise-Preference Data

  • Tyler Lu
  • Craig Boutilier

Learning preference distributions is a critical problem in many areas (e.g., recommender systems, IR, social choice). However, many existing learning and inference methods impose restrictive assumptions on the form of user preferences that can be admitted as evidence. We relax these restrictions by considering as data arbitrary pairwise comparisons of alternatives, which represent the fundamental building blocks of ordinal rankings. We develop the first algorithms for learning Mallows models (and mixtures thereof) from pairwise comparison data. At the heart of our technique is a new algorithm, the generalized repeated insertion model (GRIM), which allows sampling from arbitrary ranking distributions, and conditional Mallows models in particular. While we show that sampling from a Mallows model with pairwise evidence is computationally difficult in general, we develop approximate samplers that are exact for many important special cases--and have provable bounds with pairwise evidence--and derive algorithms for evaluating log-likelihood, learning Mallows mixtures, and non-parametric estimation. Experiments on real-world data sets demonstrate the effectiveness of our approach. (Some parts of this paper appeared in: T. Lu and C. Boutilier, Learning Mallows Models with Pairwise Preferences, Proceedings of the Twenty- Eighth International Conference on Machine Learning (ICML 2011), pp.145-152, Bellevue, WA (2011).) [abs] [ pdf ][ bib ] &copy JMLR 2014. ( edit, beta )

AIJ Journal 2014 Journal Article

On the value of using group discounts under price competition

  • Reshef Meir
  • Tyler Lu
  • Moshe Tennenholtz
  • Craig Boutilier

The increasing use of group discounts has provided opportunities for buying groups with diverse preferences to coordinate their behavior in order to exploit the best offers from multiple vendors. We analyze this problem from the viewpoint of the vendors, asking under what conditions a vendor should adopt a volume-based price schedule rather than posting a fixed price. We consider both the case of monopolist vendors and cases where a vendor competes with other vendors. When vendors have uncertainty about buyers' valuations specified by a known distribution, we show that a vendor is always better off posting a fixed price, provided that buyer types (valuations) are i. i. d. and that other vendors also use fixed prices. We also show that these assumptions cannot be relaxed: if buyer types are not i. i. d. , or other vendors post discount schedules, then posting a schedule may yield a higher profit for the vendor. We provide similar results under a distribution-free uncertainty model, where vendors minimize their maximum regret over all type realizations.

IJCAI Conference 2013 Conference Paper

Multi-Winner Social Choice with Incomplete Preferences

  • Tyler Lu
  • Craig Boutilier

Multi-winner social choice considers the problem of selecting a slate of K options to realize some social objective. It has found application in the construction of political legislatures and committees, product recommendation, and related problems, and has recently attracted attention from a computational perspective. We address the multi-winner problem when facing incomplete voter preferences, using the notion of minimax regret to determine a robust slate of options in the presence of preference uncertainty. We analyze the complexity of this problem and develop new exact and greedy robust optimization algorithms for its solution. Using these techniques, we also develop preference elicitation heuristics which, in practice, allow us to find near-optimal slates with considerable savings in the preference information required vis-à-vis complete votes.

AAAI Conference 2013 Conference Paper

On the Value of Using Group Discounts under Price Competition

  • Reshef Meir
  • Tyler Lu
  • Moshe Tennenholtz
  • Craig Boutilier

The increasing use of group discounts has provided opportunities for buying groups with diverse preferences to coordinate their behavior in order to exploit the best offers from multiple vendors. We analyze this problem from the viewpoint of the vendors, asking under what conditions a vendor should adopt a volume-based price schedule rather than posting a fixed price, either as a monopolist or when competing with other vendors. When vendors have uncertainty about buyers’ valuations specified by a known distribution, we show that a vendor is always better off posting a fixed price, provided that buyers’ types are i. i. d. and that other vendors also use fixed prices. We also show that these assumptions cannot be relaxed: if buyers are not i. i. d. , or other vendors post discount schedules, then posting a schedule may yield higher profit for the vendor. We provide similar results under a distribution-free uncertainty model, where vendors minimize their maximum regret over all type realizations.

UAI Conference 2012 Conference Paper

Bayesian Vote Manipulation: Optimal Strategies and Impact on Welfare

  • Tyler Lu
  • Pingzhong Tang
  • Ariel D. Procaccia
  • Craig Boutilier

Most analyses of manipulation of voting schemes have adopted two assumptions that greatly diminish their practical import. First, it is usually assumed that the manipulators have full knowledge of the votes of the nonmanipulating agents. Second, analysis tends to focus on the probability of manipulation rather than its impact on the social choice objective (e.g., social welfare). We relax both of these assumptions by analyzing optimal Bayesian manipulation strategies when the manipulators have only partial probabilistic information about nonmanipulator votes, and assessing the expected loss in social welfare (in the broad sense of the term). We present a general optimization framework for the derivation of optimal manipulation strategies given arbitrary voting rules and distributions over preferences. We theoretically and empirically analyze the optimal manipulability of some popular voting rules using distributions and real data sets that go well beyond the common, but unrealistic, impartial culture assumption. We also shed light on the stark difference between the loss in social welfare and the probability of manipulation by showing that even when manipulation is likely, impact to social welfare is slight (and often negligible).

IJCAI Conference 2011 Conference Paper

Budgeted Social Choice: From Consensus to Personalized Decision Making

  • Tyler Lu
  • Craig Boutilier

We develop a general framework for social choice problems in which a limited number of alternatives can be recommended to an agent population. In our budgeted social choice model, this limit is determined by a budget, capturing problems that arise naturally in a variety of contexts, and spanning the continuum from pure consensus decision making (i. e. , standard social choice) to fully personalized recommendation. Our approach applies a form of segmentation to social choice problems- requiring the selection of diverse options tailored to different agent types-and generalizes certain multi-winner election schemes. We show that standard rank aggregation methods perform poorly, and that optimization in our model is NP-complete; but we develop fast greedy algorithms with some theoretical guarantees. Experiments on real-world datasets demonstrate the effectiveness of our algorithms.

IJCAI Conference 2011 Conference Paper

Robust Approximation and Incremental Elicitation in Voting Protocols

  • Tyler Lu
  • Craig Boutilier

While voting schemes provide an effective means for aggregating preferences, methods for the effective elicitation of voter preferences have received little attention. We address this problem by first considering approximate winner determination when incomplete voter preferences are provided. Exploiting natural scoring metrics, we use max regret to measure the quality or robustness of proposed winners, and develop polynomial time algorithms for computing the alternative with minimax regret for several popular voting rules. We then show how minimax regret can be used to effectively drive incremental preference/vote elicitation and devise several heuristics for this process. Despite worst-case theoretical results showing that most voting protocols require nearly complete voter preferences to determine winners, we demonstrate the practical effectiveness of regret-based elicitation for determining both approximate and exact winners on several real-world data sets.