Author name cluster

John N. Tsitsiklis

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers

2 author rows

ICML Conference 2011 Conference Paper

Mean-Variance Optimization in Markov Decision Processes

Shie Mannor
John N. Tsitsiklis

Details

JMLR Journal 2009 Journal Article

Online Learning with Sample Path Constraints

Shie Mannor
John N. Tsitsiklis
Jia Yuan Yu

We study online learning where a decision maker interacts with Nature with the objective of maximizing her long-term average reward subject to some sample path average constraints. We define the reward-in-hindsight as the highest reward the decision maker could have achieved, while satisfying the constraints, had she known Nature's choices in advance. We show that in general the reward-in-hindsight is not attainable. The convex hull of the reward-in-hindsight function is, however, attainable. For the important case of a single constraint, the convex hull turns out to be the highest attainable function. Using a calibrated forecasting rule, we provide an explicit strategy that attains this convex hull. We also measure the performance of heuristic methods based on non-calibrated forecasters in experiments involving a CPU power management problem. [abs] [ pdf ][ bib ] &copy JMLR 2009. ( edit, beta )

PDF Details

ICML Conference 2004 Conference Paper

Bias and variance in value function estimation

Shie Mannor
Duncan Simester
Peng Sun 0001
John N. Tsitsiklis

Details

JMLR Journal 2004 Journal Article

The Sample Complexity of Exploration in the Multi-Armed Bandit Problem (Special Topic on Learning Theory)

Shie Mannor
John N. Tsitsiklis

We consider the multi-armed bandit problem under the PAC ("probably approximately correct") model. It was shown by Even-Dar et al. (2002) that given n arms, a total of O (( n /ε 2 )log(1/δ)) trials suffices in order to find an ε-optimal arm with probability at least 1-δ. We establish a matching lower bound on the expected number of trials under any sampling policy. We furthermore generalize the lower bound, and show an explicit dependence on the (unknown) statistics of the arms. We also provide a similar bound within a Bayesian setting. The case where the statistics of the arms are known but the identities of the arms are not, is also discussed. For this case, we provide a lower bound of Θ((1/ε 2 )( n +log(1/δ))) on the expected number of trials, as well as a sampling policy with a matching upper bound. If instead of the expected number of trials, we consider the maximum (over all sample paths) number of trials, we establish a matching upper and lower bound of the form Θ(( n /ε 2 )log(1/δ)). Finally, we derive lower bounds on the expected regret, in the spirit of Lai and Robbins. [abs] [ pdf ][ ps.gz ][ ps ]

PDF Details

JMLR Journal 2002 Journal Article

On the Convergence of Optimistic Policy Iteration

John N. Tsitsiklis

We consider a finite-state Markov decision problem and establish the convergence of a special case of optimistic policy iteration that involves Monte Carlo estimation of Q -values, in conjunction with greedy policy selection. We provide convergence results for a number of algorithmic variations, including one that involves temporal difference learning (bootstrapping) instead of Monte Carlo estimation. We also indicate some extensions that either fail or are unlikely to go through.

PDF Details

TCS Journal 2001 Journal Article

Deciding stability and mortality of piecewise affine dynamical systems

Vincent D. Blondel
Olivier Bournez
Pascal Koiran
Christos H. Papadimitriou
John N. Tsitsiklis

In this paper we study problems such as: given a discrete time dynamical system of the form x(t+1)=f(x(t)) where f: R n→R n is a piecewise affine function, decide whether all trajectories converge to 0. We show in our main theorem that this Attractivity Problem is undecidable as soon as n⩾2. The same is true of two related problems: Stability (is the dynamical system globally asymptotically stable?) and Mortality (do all trajectories go through 0?). We then show that Attractivity and Stability become decidable in dimension 1 for continuous functions.

Details DOI

FOCS Conference 1990 Conference Paper

Communication Complexity of Algebraic Computation (Extended Abstract)

Zhi-Quan Luo
John N. Tsitsiklis

The authors consider a situation in which two processors P/sub 1/ and P/sub 2/ are to evaluate one or more functions f/sub 1/, .. ., f/sub s/ of two vector variables x and y, under the assumption that processor P/sub 1/ (respectively, P/sub 2/) has access only to the value of x (respectively, y) and the functional form of f/sub 1/, .. ., f/sub s/. They consider a continuous model of communication whereby real-valued messages are transmitted, and they study the minimum number of messages required for the desired computation. Tight lower bounds are established for the following three problems: (1) each f/sub i/ is a rational function and only one-way communication is allowed. (2) The variables x and y are matrices and the processors wish to solve the linear system (x+y)z=b for the unknown z. (3) The processors wish to evaluate a particular root of the polynomial equation Sigma (x/sub i/+y/sub i/)z/sup i/=0, where the sum is from i=0 to n-1. >

Details

FOCS Conference 1990 Conference Paper

On the Predictability of Coupled Automata: An Allegory about Chaos

Sam Buss
Christos H. Papadimitriou
John N. Tsitsiklis

The authors show a sharp dichotomy between systems of identical automata with symmetric global control whose behavior is easy to predict and those whose behavior is hard to predict. The division pertains to whether the global control rule is invariant with respect to permutations of the states of the automaton. It is also shown that testing whether the global control rule has this invariance property is an undecidable problem. It is argued that there is a natural analog between complexity in the present model and chaos in dynamical systems. >

Details