Reinforcement Learning for Clinical Applications
Kia Khezeli, Scott Siegel, Benjamin Shickel, Tezcan Ozrazgat‐Baslanti, Azra Bihorac, Parisa Rashidi
- Year
- 2023
- Citations
- 12
Abstract
Introduction Reinforcement learning formalizes the concept of learning from interactions.1 Broadly, reinforcement learning focuses on a setting in which an agent (decision maker) sequentially interacts with an environment that is partially unknown to them. At each stage, the agent takes an action and receives a reward. The objective of the agent is to maximize rewards accumulated in the long run. There are many situations in health care where decisions are made sequentially for which reinforcement learning approaches could prove useful for decision making. Throughout this article, we consider treatment prescription as an archetypical example to connect reinforcement learning concepts to a health care setting. In this setting, the care provider, the prescribed treatment, and the patients can be viewed as the agent, the action, and the environment, respectively, as depicted in Figure 1.Figure 1: Sequential treatment of AKI or CKD complications modeled as a reinforcement learning problem.Background In this section, with the objective of making reinforcement learning literature more accessible to a clinical audience, we briefly introduce related fundamental concepts and approaches. We refer the interested reader to Sutton and Barto1 for a comprehensive introduction to reinforcement learning. Markov Decision Processes Markov decision processes (MDPs) are a formalism of the sequential decision-making problem that has been central to the theoretical and practical advancements of reinforcement learning. In each stage of an MDP, the agent observes the state of the environment and takes an action, which, in turn, results in a change of the state. This change of state is assumed to be probabilistic with the next state being determined only by the preceding state, the chosen action, and the transition probability. The agent also receives a reward that is a function of the taken action, the preceding state, and the subsequent state. In an MDP, the objective of the agent is to maximize the return defined as the reward accumulated over a time horizon. In some applications, it is common to consider the horizon to be infinite, in which case the future rewards are discounted by a factor smaller than one. The selection of action by the agent on the basis of the observed state is known as the policy. More formally, a policy is a probabilistic mapping from states to each possible action. Because the policy and the reward are a function of the state, it is critical to estimate the utility of being in a certain state. More specifically, the value function is defined as the expected return starting from a given state under the chosen policy. Under this formalism, the objective of the agent is to find the optimal policy that maximizes the value function for all states. Reinforcement Learning Methods Action-value methods are a class of reinforcement learning methods in which the actions are chosen on the basis of the estimation of their long-term value. A prominent example of an action-value method is Q-learning in which the agent iteratively takes actions with the highest estimated values and updates the action-state value function on the basis of new observations. Policy gradient methods are another class of reinforcement learning methods that seek to optimize the policy directly instead of choosing actions on the basis of their respective estimated value. Such methods could be advantageous in health care applications that entail a large number of possible actions, e.g., when recommending a wide range of drug dosages or treatment options. Clinical Applications Reinforcement learning frameworks and methods are broadly applicable to clinical settings in which decisions are made sequentially. A prominent clinical application of reinforcement learning is for treatment recommendation, which has been studied across a variety of diseases and treatments including radiation and chemotherapy for cancer, brain stimulation for epilepsy, and treatment strategies
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002