Home /Research /Learning POMDPs with Linear Function Approximation and Finite Memory

LEARNING

Learning POMDPs with Linear Function Approximation and Finite Memory

Ali Devran Kara

Year: 2025
Access: Open access

Abstract

We study reinforcement learning with linear function approximation and finite-memory approximations for partially observed Markov decision processes (POMDPs). We first present an algorithm for the value evaluation of finite-memory feedback policies. We provide error bounds derived from filter stability and projection errors. We then study the learning of finite-memory based near-optimal Q values. Convergence in this case requires further assumptions on the exploration policy when using general basis functions. We then show that these assumptions can be relaxed for specific models such as those with perfectly linear cost and dynamics, or when using discretization based basis functions.

Keywords

math.OCeess.SY

Learning POMDPs with Linear Function Approximation and Finite Memory

Abstract

Keywords

Related papers

The Organization of Behavior

Fractional Brownian Motions, Fractional Noises and Applications

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

A guide to deep learning in healthcare