Policy Gradient Method for LQG Control via Input-Output-History Representation: Convergence to $O(ε)$-Stationary Points
Tomonori Sadamoto, Takashi Tanaka
- Year
- 2025
- Access
- Open access
Abstract
We study the policy gradient method (PGM) for the linear quadratic Gaussian (LQG) dynamic output-feedback control problem using an input-output-history (IOH) representation of the closed-loop system. First, we show that any dynamic output-feedback controller is equivalent to a static partial-state feedback gain for a new system representation characterized by a finite-length IOH. Leveraging this equivalence, we reformulate the search for an optimal dynamic output feedback controller as an optimization problem over the corresponding partial-state feedback gain. Next, we introduce a relaxed version of the IOH-based LQG problem by incorporating a small process noise with covariance $εI$ into the new system to ensure coerciveness, a key condition for establishing gradient-based convergence guarantees. Consequently, we show that a vanilla PGM for the relaxed problem converges to an $\mathcal{O}(ε)$-stationary point, i.e., $\overline{K}$ satisfying $\|\nabla J(\overline{K})\|_F \leq \mathcal{O}(ε)$, where $J$ denotes the original LQG cost. Numerical experiments empirically indicate convergence to the vicinity of the globally optimal LQG controller.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Fractional Differential Equations
Igor Podlubný
2025
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
Genetic Programming: On the Programming of Computers by Means of Natural Selection
John R. Koza
1992