Author name cluster

Utkarsh Upadhyay

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

2 author rows

AAAI Conference 2020 Conference Paper

Learning to Crawl

Utkarsh Upadhyay
Robert Busa-Fekete
Wojciech Kotlowski
David Pal
Balazs Szorenyi

Web crawling is the problem of keeping a cache of webpages fresh, i. e. , having the most recent copy available when a page is requested. This problem is usually coupled with the natural restriction that the bandwidth available to the web crawler is limited. The corresponding optimization problem was solved optimally by Azar et al. (2018) under the assumption that, for each webpage, both the elapsed time between two changes and the elapsed time between two requests follows a Poisson distribution with known parameters. In this paper, we study the same control problem but under the assumption that the change rates are unknown a priori, and thus we need to estimate them in an online fashion using only partial observations (i. e. , single-bit signals indicating whether the page has changed since the last refresh). As a point of departure, we characterise the conditions under which one can solve the problem with such partial observability. Next, we propose a practical estimator and compute conﬁdence intervals for it in terms of the elapsed time between the observations. Finally, we show that the explore-and-commit algorithm achieves an O( √ T) regret with a carefully chosen exploration horizon. Our simulation study shows that our online policy scales well and achieves close to optimal performance for a wide range of parameters.

PDF Details

NeurIPS Conference 2018 Conference Paper

Deep Reinforcement Learning of Marked Temporal Point Processes

Utkarsh Upadhyay
Abir De
Manuel Gomez Rodriguez

In a wide variety of applications, humans interact with a complex environment by means of asynchronous stochastic discrete events in continuous time. Can we design online interventions that will help humans achieve certain goals in such asynchronous setting? In this paper, we address the above problem from the perspective of deep reinforcement learning of marked temporal point processes, where both the actions taken by an agent and the feedback it receives from the environment are asynchronous stochastic discrete events characterized using marked temporal point processes. In doing so, we define the agent's policy using the intensity and mark distribution of the corresponding process and then derive a flexible policy gradient method, which embeds the agent's actions and the feedback it receives into real-valued vectors using deep recurrent neural networks. Our method does not make any assumptions on the functional form of the intensity and mark distribution of the feedback and it allows for arbitrarily complex reward functions. We apply our methodology to two different applications in viral marketing and personalized teaching and, using data gathered from Twitter and Duolingo, we show that it may be able to find interventions to help marketers and learners achieve their goals more effectively than alternatives.

PDF Details

JMLR Journal 2018 Journal Article

Steering Social Activity: A Stochastic Optimal Control Point Of View

Ali Zarezade
Abir De
Utkarsh Upadhyay
Hamid R. Rabiee
Manuel Gomez-Rodriguez

User engagement in online social networking depends critically on the level of social activity in the corresponding platform---the number of online actions, such as posts, shares or replies, taken by their users. Can we design data-driven algorithms to increase social activity? At a user level, such algorithms may increase activity by helping users decide when to take an action to be more likely to be noticed by their peers. At a network level, they may increase activity by incentivizing a few influential users to take more actions, which in turn will trigger additional actions by other users. In this paper, we model social activity using the framework of marked temporal point processes, derive an alternate representation of these processes using stochastic differential equations (SDEs) with jumps and, exploiting this alternate representation, develop two efficient online algorithms with provable guarantees to steer social activity both at a user and at a network level. In doing so, we establish a previously unexplored connection between optimal control of jump SDEs and doubly stochastic marked temporal point processes, which is of independent interest. Finally, we experiment both with synthetic and real data gathered from Twitter and show that our algorithms consistently steer social activity more effectively than the state of the art. [abs] [ pdf ][ bib ] &copy JMLR 2018. ( edit, beta )

PDF Details

ICRA Conference 2012 Conference Paper

Real-time automated modeling and control of self-assembling systems

Grégory Mermoud
Massimo Mastrangeli
Utkarsh Upadhyay
Alcherio Martinoli

We present the M 3 framework, a formal and generic computational framework for modeling and controlling stochastic distributed systems of purely reactive robots in an automated and real-time fashion. Based on the trajectories of the robots, the framework builds up an internal microscopic representation of the system, which then serves as a blueprint of models at higher abstraction levels. These models are then calibrated using a Maximum Likelihood Estimation (MLE) algorithm. We illustrate the structure and performance of the framework by performing the online optimization of a bang-bang controller for the stochastic self-assembly of water-floating, magnetically latching, passive modules. The experimental results demonstrate that the generated models can successfully optimize the assembly of desired structures.

Details