Arrow Research search
Back to NeurIPS

NeurIPS 2025

Beyond Average Value Function in Precision Medicine: Maximum Probability-Driven Reinforcement Learning for Survival Analysis

Conference Paper Main Conference Track Artificial Intelligence ยท Machine Learning

Abstract

Constructing multistage optimal decisions for alternating recurrent event data is critically important in medical and healthcare research. Current reinforcement learning (RL) algorithms have only been applied to time-to-event data, with the objective of maximizing expected survival time. However, alternating recurrent event data has a different structure, which motivates us to model the probability and frequency of event occurrences rather than a single terminal outcome. In this paper, we introduce an RL framework specifically designed for alternating recurrent event data. Our goal is to maximize the probability that the duration between consecutive events exceeds a clinically meaningful threshold. To achieve this, we identify a lower bound of this probability, which transforms the problem into maximizing a cumulative sum of log probabilities, thus enabling direct application of standard RL algorithms. We establish the theoretical properties of the resulting optimal policy and demonstrate through numerical experiments that our proposed algorithm yields a larger probability of that the time between events exceeds a critical threshold compared with existing state-of-the-art algorithms.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
Annual Conference on Neural Information Processing Systems
Archive span
1987-2025
Indexed papers
30776
Paper id
909865327037223757