Arrow Research search

Author name cluster

Oliver Schulte

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

22 papers
2 author rows

Possible papers

22

TMLR Journal 2025 Journal Article

When Should Reinforcement Learning Use Causal Reasoning?

  • Oliver Schulte
  • Pascal Poupart

Reinforcement learning (RL) and causal reasoning naturally complement each other. The goal of causal reasoning is to predict the effects of interventions in an environment, while the goal of reinforcement learning is to select interventions that maximize the rewards the agent receives from the environment. Reinforcement learning includes the two most powerful sources of information for estimating causal relationships: temporal ordering and the ability to act on an environment. This paper provides a theoretical study examining which reinforcement learning settings we can expect to benefit from causal reasoning, and how. According to our analysis, the key factor is {\em whether the behavioral policy---which generates the data---can be executed by the learning agent}, meaning that the observation signal available to the learning agent comprises all observations used by the behavioral policy. Common RL settings with behavioral policies that are executable by the learning agent include on-policy learning and online exploration, where the learning agent uses a behavioral policy to explore the environment. Common RL settings with behavioral policies that are not executable by the learning agent include offline learning with a partially observable state space and asymmetric imitation learning where the demonstrator has access to more observations than the imitator. Using the theory of causal graphs, we show formally that when the behavioral policy is executable by the learning agent, conditional probabilities are causal, and can therefore be used to estimate expected rewards as done in traditional RL. However, when the behavioral policy is not executable by the learning agent, conditional probabilities may be confounded and provide misleading estimates of expected rewards. For confounded settings, we describe previous and new methods for leveraging causal reasoning.

ECAI Conference 2024 Conference Paper

Deep Generative Models for Subgraph Prediction

  • Erfaneh Mahmoudzadeh
  • Parmis Naddaf
  • Kiarash Zahirnia
  • Oliver Schulte

Graph Neural Networks (GNNs) are important across different domains, such as social network analysis and recommendation systems, due to their ability to model complex relational data. This paper introduces subgraph queries as a new task for deep graph learning. Unlike traditional graph prediction tasks that focus on individual components like link prediction or node classification, subgraph queries jointly predict the components of a target subgraph based on evidence that is represented by an observed subgraph. For instance, a subgraph query can predict a set of target links and/or node labels. To answer subgraph queries, we utilize a probabilistic deep Graph Generative Model. Specifically, we inductively train a Variational Graph Auto-Encoder (VGAE) model, augmented to represent a joint distribution over links, node features and labels. Bayesian optimization is used to tune a weighting for the relative importance of links, node features and labels in a specific domain. We describe a deterministic and a sampling-based inference method for estimating subgraph probabilities from the VGAE generative graph distribution, without retraining, in zero-shot fashion. For evaluation, we apply the inference methods on a range of subgraph queries on six benchmark datasets. We find that inference from a model achieves superior predictive performance, surpassing independent prediction baselines with improvements in AUC scores ranging from 0. 06 to 0. 2 points, depending on the dataset.

NeurIPS Conference 2023 Conference Paper

Cause-Effect Inference in Location-Scale Noise Models: Maximum Likelihood vs. Independence Testing

  • Xiangyu Sun
  • Oliver Schulte

A fundamental problem of causal discovery is cause-effect inference, to learn the correct causal direction between two random variables. Significant progress has been made through modelling the effect as a function of its cause and a noise term, which allows us to leverage assumptions about the generating function class. The recently introduced heteroscedastic location-scale noise functional models (LSNMs) combine expressive power with identifiability guarantees. LSNM model selection based on maximizing likelihood achieves state-of-the-art accuracy, when the noise distributions are correctly specified. However, through an extensive empirical evaluation, we demonstrate that the accuracy deteriorates sharply when the form of the noise distribution is misspecified by the user. Our analysis shows that the failure occurs mainly when the conditional variance in the anti-causal direction is smaller than that in the causal direction. As an alternative, we find that causal model selection through residual independence testing is much more robust to noise misspecification and misleading conditional variance.

ICML Conference 2023 Conference Paper

Generative Causal Representation Learning for Out-of-Distribution Motion Forecasting

  • Shayan Shirahmad Gale Bagi
  • Zahra Gharaee
  • Oliver Schulte
  • Mark Crowley 0001

Conventional supervised learning methods typically assume i. i. d samples and are found to be sensitive to out-of-distribution (OOD) data. We propose Generative Causal Representation Learning (GCRL) which leverages causality to facilitate knowledge transfer under distribution shifts. While we evaluate the effectiveness of our proposed method in human trajectory prediction models, GCRL can be applied to other domains as well. First, we propose a novel causal model that explains the generative factors in motion forecasting datasets using features that are common across all environments and with features that are specific to each environment. Selection variables are used to determine which parts of the model can be directly transferred to a new environment without fine-tuning. Second, we propose an end-to-end variational learning paradigm to learn the causal mechanisms that generate observations from features. GCRL is supported by strong theoretical results that imply identifiability of the causal model under certain assumptions. Experimental results on synthetic and real-world motion forecasting datasets show the robustness and effectiveness of our proposed method for knowledge transfer under zero-shot and low-shot settings by substantially outperforming the prior motion forecasting models on out-of-distribution prediction.

NeurIPS Conference 2023 Conference Paper

Neural Graph Generation from Graph Statistics

  • Kiarash Zahirnia
  • Yaochen Hu
  • Mark Coates
  • Oliver Schulte

We describe a new setting for learning a deep graph generative model (GGM) from aggregate graph statistics, rather than from the graph adjacency matrix. Matching the statistics of observed training graphs is the main approach for learning traditional GGMs (e. g, BTER, Chung-Lu, and Erdos-Renyi models). Privacy researchers have proposed learning from graph statistics as a way to protect privacy. We develop an architecture for training a deep GGM to match statistics while preserving local differential privacy guarantees. Empirical evaluation on 8 datasets indicates that our deep GGM model generates more realistic graphs than the traditional GGMs when both are learned from graph statistics only. We also benchmark our deep GGM trained on statistics only, against state-of-the-art deep GGM models that are trained on the entire adjacency matrix. The results show that graph statistics are often sufficient to build a competitive deep GGM that generates realistic graphs while protecting local privacy.

ICLR Conference 2022 Conference Paper

Distributional Reinforcement Learning with Monotonic Splines

  • Yudong Luo
  • Guiliang Liu
  • Haonan Duan 0002
  • Oliver Schulte
  • Pascal Poupart

Distributional Reinforcement Learning (RL) differs from traditional RL by estimating the distribution over returns to capture the intrinsic uncertainty of MDPs. One key challenge in distributional RL lies in how to parameterize the quantile function when minimizing the Wasserstein metric of temporal differences. Existing algorithms use step functions or piecewise linear functions. In this paper, we propose to learn smooth continuous quantile functions represented by monotonic rational-quadratic splines, which also naturally solve the quantile crossing problem. Experiments in stochastic environments show that a dense estimation for quantile functions enhances distributional RL in terms of faster empirical convergence and higher rewards in most cases.

NeurIPS Conference 2022 Conference Paper

Micro and Macro Level Graph Modeling for Graph Variational Auto-Encoders

  • Kiarash Zahirnia
  • Oliver Schulte
  • Parmis Naddaf
  • Ke Li

Generative models for graph data are an important research topic in machine learning. Graph data comprise two levels that are typically analyzed separately: node-level properties such as the existence of a link between a pair of nodes, and global aggregate graph-level statistics, such as motif counts. This paper proposes a new multi-level framework that jointly models node-level properties and graph-level statistics, as mutually reinforcing sources of information. We introduce a new micro-macro training objective for graph generation that combines node-level and graph-level losses. We utilize the micro-macro objective to improve graph generation with a GraphVAE, a well-established model based on graph-level latent variables, that provides fast training and generation time for medium-sized graphs. Our experiments show that adding micro-macro modeling to the GraphVAE model improves graph quality scores up to 2 orders of magnitude on five benchmark datasets, while maintaining the GraphVAE generation speed advantage.

NeurIPS Conference 2022 Conference Paper

Uncertainty-Aware Reinforcement Learning for Risk-Sensitive Player Evaluation in Sports Game

  • Guiliang Liu
  • Yudong Luo
  • Oliver Schulte
  • Pascal Poupart

A major task of sports analytics is player evaluation. Previous methods commonly measured the impact of players' actions on desirable outcomes (e. g. , goals or winning) without considering the risk induced by stochastic game dynamics. In this paper, we design an uncertainty-aware Reinforcement Learning (RL) framework to learn a risk-sensitive player evaluation metric from stochastic game dynamics. To embed the risk of a player’s movements into the distribution of action-values, we model their 1) aleatoric uncertainty, which represents the intrinsic stochasticity in a sports game, and 2) epistemic uncertainty, which is due to a model's insufficient knowledge regarding Out-of-Distribution (OoD) samples. We demonstrate how a distributional Bellman operator and a feature-space density model can capture these uncertainties. Based on such uncertainty estimation, we propose a Risk-sensitive Game Impact Metric (RiGIM) that measures players' performance over a season by conditioning on a specific confidence level. Empirical evaluation, based on over 9M play-by-play ice hockey and soccer events, shows that RiGIM correlates highly with standard success measures and has a consistent risk sensitivity.

NeurIPS Conference 2021 Conference Paper

Learning Tree Interpretation from Object Representation for Deep Reinforcement Learning

  • Guiliang Liu
  • Xiangyu Sun
  • Oliver Schulte
  • Pascal Poupart

Interpreting Deep Reinforcement Learning (DRL) models is important to enhance trust and comply with transparency regulations. Existing methods typically explain a DRL model by visualizing the importance of low-level input features with super-pixels, attentions, or saliency maps. Our approach provides an interpretation based on high-level latent object features derived from a disentangled representation. We propose a Represent And Mimic (RAMi) framework for training 1) an identifiable latent representation to capture the independent factors of variation for the objects and 2) a mimic tree that extracts the causal impact of the latent features on DRL action values. To jointly optimize both the fidelity and the simplicity of a mimic tree, we derive a novel Minimum Description Length (MDL) objective based on the Information Bottleneck (IB) principle. Based on this objective, we describe a Monte Carlo Regression Tree Search (MCRTS) algorithm that explores different splits to find the IB-optimal mimic tree. Experiments show that our mimic tree achieves strong approximation performance with significantly fewer nodes than baseline models. We demonstrate the interpretability of our mimic tree by showing latent traversals, decision rules, causal impacts, and human evaluation results.

IJCAI Conference 2020 Conference Paper

A Complete Characterization of Projectivity for Statistical Relational Models

  • Manfred Jaeger
  • Oliver Schulte

A generative probabilistic model for relational data consists of a family of probability distributions for relational structures over domains of different sizes. In most existing statistical relational learning (SRL) frameworks, these models are not projective in the sense that the marginal of the distribution for size-n structures on induced substructures of size k<n is equal to the given distribution for size-k structures. Projectivity is very beneficial in that it directly enables lifted inference and statistically consistent learning from sub-sampled relational structures. In earlier work some simple fragments of SRL languages have been identified that represent projective models. However, no complete characterization of, and representation framework for projective models has been given. In this paper we fill this gap: exploiting representation theorems for infinite exchangeable arrays we introduce a class of directed graphical latent variable models that precisely correspond to the class of projective relational models. As a by-product we also obtain a characterization for when a given distribution over size-k structures is the statistical frequency distribution of size-k substructures in much larger size-n structures. These results shed new light onto the old open problem of how to apply Halpern et al. 's ``random worlds approach'' for probabilistic inference to general relational signatures.

AAAI Conference 2020 Conference Paper

Deep Generative Probabilistic Graph Neural Networks for Scene Graph Generation

  • Mahmoud Khademi
  • Oliver Schulte

We propose a new algorithm, called Deep Generative Probabilistic Graph Neural Networks (DG-PGNN), to generate a scene graph for an image. The input to DG-PGNN is an image, together with a set of region-grounded captions and object bounding-box proposals for the image. To generate the scene graph, DG-PGNN constructs and updates a new model, called a Probabilistic Graph Network (PGN). A PGN can be thought of as a scene graph with uncertainty: it represents each node and each edge by a CNN feature vector and de- fines a probability mass function (PMF) for node-type (object category) of each node and edge-type (predicate class) of each edge. The DG-PGNN sequentially adds a new node to the current PGN by learning the optimal ordering in a Deep Q-learning framework, where states are partial PGNs, actions choose a new node, and rewards are defined based on the ground-truth. After adding a node, DG-PGNN uses message passing to update the feature vectors of the current PGN by leveraging contextual relationship information, object cooccurrences, and language priors from captions. The updated features are then used to fine-tune the PMFs. Our experiments show that the proposed algorithm significantly outperforms the state-of-the-art results on the Visual Genome dataset for scene graph generation. We also show that the scene graphs constructed by DG-PGNN improve performance on the visual question answering task, for questions that need reasoning about objects and their interactions in the scene context.

IJCAI Conference 2020 Conference Paper

Inverse Reinforcement Learning for Team Sports: Valuing Actions and Players

  • Yudong Luo
  • Oliver Schulte
  • Pascal Poupart

A major task of sports analytics is to rank players based on the impact of their actions. Recent methods have applied reinforcement learning (RL) to assess the value of actions from a learned action value or Q-function. A fundamental challenge for estimating action values is that explicit reward signals (goals) are very sparse in many team sports, such as ice hockey and soccer. This paper combines Q-function learning with inverse reinforcement learning (IRL) to provide a novel player ranking method. We treat professional play as expert demonstrations for learning an implicit reward function. Our method alternates single-agent IRL to learn a reward function for multiple agents; we provide a theoretical justification for this procedure. Knowledge transfer is used to combine learned rewards and observed rewards from goals. Empirical evaluation, based on 4. 5M play-by-play events in the National Hockey League (NHL), indicates that player ranking using the learned rewards achieves high correlations with standard success measures and temporal consistency throughout a season.

NeurIPS Conference 2020 Conference Paper

Learning Agent Representations for Ice Hockey

  • Guiliang Liu
  • Oliver Schulte
  • Pascal Poupart
  • Mike Rudd
  • Mehrsan Javan

Team sports is a new application domain for agent modeling with high real-world impact. A fundamental challenge for modeling professional players is their large number (over 1K), which includes many bench players with sparse participation in a game season. The diversity and sparsity of player observations make it difficult to extend previous agent representation models to the sports domain. This paper develops a new approach for agent representations, based on a Markov game model, that is tailored towards applications in professional ice hockey. We introduce a novel player representation via player generation framework where a variational encoder embeds player information with latent variables. The encoder learns a context-specific shared prior to induce a shrinkage effect for the posterior player representations, allowing it to share statistical information across players with different participations. To model the play dynamics in sequential sports data, we design a Variational Recurrent Ladder Agent Encoder (VaRLAE). It learns a contextualized player representation with a hierarchy of latent variables that effectively prevents latent posterior collapse. We validate our player representations in major sports analytics tasks. Our experimental results, based on a large dataset that contains over 4. 5M events, show state-of-the-art performance for our VarLAE on facilitating 1) identifying the acting player, 2) estimating expected goals, and 3) predicting the final score difference.

IJCAI Conference 2018 Conference Paper

Deep Reinforcement Learning in Ice Hockey for Context-Aware Player Evaluation

  • Guiliang Liu
  • Oliver Schulte

A variety of machine learning models have been proposed to assess the performance of players in professional sports. However, they have only a limited ability to model how player performance depends on the game context. This paper proposes a new approach to capturing game context: we apply Deep Reinforcement Learning (DRL) to learn an action-value Q function from 3M play-by-play events in the National Hockey League (NHL). The neural network representation integrates both continuous context signals and game history, using a possession-based LSTM. The learned Q-function is used to value players' actions under different game contexts. To assess a player's overall performance, we introduce a novel Game Impact Metric (GIM) that aggregates the values of the player's actions. Empirical Evaluation shows GIM is consistent throughout a play season, and correlates highly with standard success measures and future salary.

IJCAI Conference 2017 Conference Paper

Locally Consistent Bayesian Network Scores for Multi-Relational Data

  • Oliver Schulte
  • Sajjad Gholami

An important task for relational learning is Bayesian network (BN) structure learning. A fundamental component of structure learning is a model selection score that measures how well a model fits a dataset. We describe a new method that upgrades for multi-relational databases, a log-linear BN score designed for single-table i. i. d. data. Chickering and Meek showed that for i. i. d. data, standard BN scores are locally consistent, meaning that their maxima converge to an optimal model, that represents the data generating distribution {\em and} contains no redundant edges. Our main theorem establishes that if a model selection score is locally consistent for i. i. d. data, then our upgraded gain function is locally consistent for relational data as well. To our knowledge this is the first consistency result for relational structure learning. A novel aspect of our approach is employing a {\em gain function} that compares two models: a current vs. an alternative BN structure. In contrast, previous approaches employed a score that is a function of a single model only. Empirical evaluation on six benchmark relational databases shows that our gain function is also practically useful: On realistic size data sets, it selects informative BN structures with a better data fit than those selected by baseline single-model scores.

UAI Conference 2015 Conference Paper

A Markov Game Model for Valuing Player Actions in Ice Hockey

  • Kurt Routley
  • Oliver Schulte

A variety of advanced statistics are used to evaluate player actions in the National Hockey League, but they fail to account for the context in which an action occurs or to look ahead to the long-term effects of an action. We apply the Markov Game formalism to develop a novel approach to valuing player actions in ice hockey that incorporates context and lookahead. Dynamic programming is used to learn Q-functions that quantify the impact of actions on goal scoring resp. penalties. Learning is based on a massive dataset that contains over 2. 8M events in the National Hockey League. The impact of player actions is found to vary widely depending on the context, with possible positive and negative effects for the same action. We show that lookahead makes a substantial difference to the action impact scores. Players are ranked according to the aggregate impact of their actions. We compare this impact ranking with previous player metrics, such as plus-minus, total points, and salary.

TCS Journal 2010 Journal Article

Evolutionary equilibrium in Bayesian routing games: Specialization and niche formation

  • Petra Berenbrink
  • Oliver Schulte

In this paper we consider Nash equilibria for the selfish task allocation game proposed in Koutsoupias, Papadimitriou (1999) [26], where a set of n users with unsplittable tasks of different size try to access m parallel links with different speeds. In this game, a player can use a mixed strategy (where he uses different links with a positive probability); then he is indifferent between the different link choices. This means that the player may well deviate to a different strategy over time. We propose the concept of evolutionary stable strategies (ESS) as a criterion for stable Nash equilibria, i. e. equilibria where no player is likely to deviate from his strategy. An ESS is a steady state that can be reached by a user community via evolutionary processes in which more successful strategies spread over time. The concept has been used widely in biology and economics to analyze the dynamics of strategic interactions. We first define a symmetric version of a Bayesian parallel links game where every player is not assigned a task of a fixed size but instead is assigned a task drawn from a distribution, which is the same for all players. We establish that the ESS is uniquely determined for a given symmetric Bayesian parallel links game (when it exists). Thus evolutionary stability places strong constraints on the assignment of tasks to links. We characterize ESS for the Bayesian parallel links game, and investigate the structure of evolutionarily stable equilibria: In an ESS, links acquire niches, meaning that there is minimal overlap in the tasks served by different links. Furthermore, all links with the same speed are interchangeable for every task with weight w: Every player must place a task with weight w on links having the same speed with the same probability. Also, bigger tasks must be assigned to faster links and faster links must have a bigger load. Finally, we introduce a clustering condition–roughly, distinct links must serve distinct tasks–that is sufficient for evolutionary stability, and can be used to find an ESS in many models.

I&C Journal 2010 Journal Article

Mind change optimal learning of Bayes net structure from dependency and independency data

  • Oliver Schulte
  • Wei Luo
  • Russell Greiner

This paper analyzes the problem of learning the structure of a Bayes net in the theoretical framework of Gold’s learning paradigm. Bayes nets are one of the most prominent formalisms for knowledge representation and probabilistic and causal reasoning. We follow constraint-based approaches to learning Bayes net structure, where learning is based on observed conditional dependencies and independencies between variables of interest (e. g. , the data are of the form “X is dependent on Y given any assignment to variables S” or of the form “X is independent of Y given any assignment to variables S”). Applying learning criteria in this model leads to the following results. (1) The mind change complexity of identifying a Bayes net graph over variables V from either dependency data or from independency data are | v | 2, the maximum number of edges. (2) There is a unique fastest mind-change optimal Bayes net learner for either data type; convergence speed is evaluated using Gold’s dominance notion of “uniformly faster convergence”. For dependency data, the optimal learner conjectures a graph if it is the unique Bayes net pattern that satisfies the observed dependencies with a minimum number of edges, and outputs “no guess” otherwise. For independency data, the optimal learner conjectures a graph if it is the unique Bayes net pattern that satisfies the observed dependencies with a maximum number of edges, and outputs “no guess” otherwise. We investigate the complexity of computing the output of the fastest mind-change optimal learner for either data type, and show that each of these two problems is NP-hard (assuming P=RP). To our knowledge these are the first NP-hardness results concerning the existence of a uniquely optimal Bayes net structure.

AAAI Conference 2010 Conference Paper

Structure Learning for Markov Logic Networks with Many Descriptive Attributes

  • Hassan Khosravi
  • Oliver Schulte
  • Tong Man
  • Xiaoyuan Xu
  • Bahareh Bina

Many machine learning applications that involve relational databases incorporate first-order logic and probability. Markov Logic Networks (MLNs) are a prominent statistical relational model that consist of weighted first order clauses. Many of the current state-of-the-art algorithms for learning MLNs have focused on relatively small datasets with few descriptive attributes, where predicates are mostly binary and the main task is usually prediction of links between entities. This paper addresses what is in a sense a complementary problem: learning the structure of an MLN that models the distribution of discrete descriptive attributes on medium to large datasets, given the links between entities in a relational database. Descriptive attributes are usually nonbinary and can be very informative, but they increase the search space of possible candidate clauses. We present an efficient new algorithm for learning a directed relational model (parametrized Bayes net), which produces an MLN structure via a standard moralization procedure for converting directed models to undirected models. Learning MLN structure in this way is 200-1000 times faster and scores substantially higher in predictive accuracy than benchmark algorithms on three relational databases.

IJCAI Conference 2009 Conference Paper

  • Oliver Schulte

Particle physics experiments, like the Large Hadron Collider in Geneva, can generate thousands of data points listing detected particle reactions. An important learning task is to analyze the reaction data for evidence of conserved quantities and hidden particles. This task involves latent structure in two ways: first, hypothesizing hidden quantities whose conservation determines which reactions occur, and second, hypothesizing the presence of hidden particles. We model this problem in the classic linear algebra framework of automated scientific discovery due to Valdés-Pérez, Żytkow and Simon, where both reaction data and conservation laws are represented as matrices. We introduce a new criterion for selecting a matrix model for reaction data: find hidden particles and conserved quantities that rule out as many interactions among the nonhidden particles as possible. A polynomial-time algorithm for optimizing this criterion is based on the new theorem that hidden particles are required if and only if the Smith Normal Form of the reaction matrix R contains entries other than 0 or 1. To our knowledge this is the first application of Smith matrix decomposition to a problem in AI. Using data from particle accelerators, we compare our algorithm to the main model of particles in physics, known as the Standard Model: our algorithm discovers conservation laws that are equivalent to those in the Standard Model, and indicates the presence of a hidden particle (the electron antineutrino) in accordance with the Standard Model.

I&C Journal 2006 Journal Article

Mind change efficient learning

  • Wei Luo
  • Oliver Schulte

This paper studies efficient learning with respect to mind changes. Our starting point is the idea that a learner that is efficient with respect to mind changes minimizes mind changes not only globally in the entire learning problem, but also locally in subproblems after receiving some evidence. Formalizing this idea leads to the notion of strong mind change optimality. We characterize the structure of language classes that can be identified with at most α mind changes by some learner (not necessarily effective): a language class L is identifiable with α mind changes iff the accumulation order of L is at most α. Accumulation order is a classic concept from point-set topology. We show that accumulation order is related to other established notions of structural complexity, such as thickness and intrinsic complexity. To aid the construction of learning algorithms, we show that the characteristic property of strongly mind change optimal learners is that they output conjectures (languages) with maximal accumulation order. We illustrate the theory by describing strongly mind change optimal learners for various problems such as identifying linear subspaces, one-variable patterns, and fixed-length patterns.

TARK Conference 2003 Conference Paper

Iterated backward inference: an algorithm for proper rationalizability

  • Oliver Schulte

An important approach to game theory is to examine the consequences of beliefs that agents may have about each otlmr. This paper investigates respect for public preferences. Consider an agent A who believes that B strictly prefers an option a to an option b. Then A respects B's preference if A assigns probability 1 to the choice of a given that B chooses a or b. Respect for public preferences requires that if it is, common belief that B prefers a to b, then it is common belief that all other agents respect that preference. Along the lines of Blume, Brandenburger and Dekel [3] and Asheim [1], I treat respect for public preferences as a constraint on lexicographic probability systems. The main result is that given respect for public preferences and perfect recall, players choose in accordance with Iterated Backward Inference. Iterated Backward Inference is a procedure that generalizes standard backward induction reasoning for games of both perfect and imperfect information. From Asheim's characterization of proper rationalizability [1] it follows that properly rationalizable strategies are consistent with respect for public preferences; hence strategies eliminated by Iterated Backward Inference are not properly rationalizable.