From Hamilton-Jacobi-Bellman to Pontryagin’s Maximum Principle
Connections and implications
Closely related to each other, the Hamilton-Jacobi-Bellman equation and Pontryagin’s Maximum Principle were two of the most important pillars in Optimal control theory. In this article, we provide a pathway to derive PMP from HJB under some mild assumptions, showing the connections in between, and most importantly, revealing the beautiful structure of optimal control problems. Inspired by Huygens–Fresnel principle, two toy examples were also provided, each tackled from different aspects to demonstrate potential challenges for each method.
1. HJB in continuous system
Let’s start by reviewing the HJB equation and introduce the optimal-cost-to-go function, which plays a central role in optimal problems with causal or chain-liked structures.
1.1 Optimal Control Problem
Given dynamic system ‘f’, the optimal control problem aims to find trajectories of state ‘x(t)’ and input ‘u(t)’ which minimizes the functional defined by running ‘L’ and termination ‘ϕ’ cost. For those attained optimum, we denote the corresponding value as ‘J’, the optimal cost-to-go.
1.2 Principle of Optimality
Following the discussion made in the previous post [1], we have ‘J’ as a function of the initial condition ‘x₀, t₀’, which can be extended over any other space-time provided it reaches the domain of ‘ϕ’ at time ‘T’. In essence, this creates neighbour problems, covering the original one which corresponds to ‘J(x₀, t₀)’. Since optimal input drives the overall lowest immediate and future costs, we have the following derivation that leads to the celebrated HJB equation [2].
With optimum input ‘u’ attained and applied everywhere in space-time, HJB provides a propagation rule of ‘J’ in PDE form. Except for those cases with discontinuous control trajectories (unfortunately, this is quite common, see § 2), we can neglect the directional derivative and simplify HJB as follows.
Before we move on to the next section, instead of boring equations, one may skip to § 2 to gain a more tangible intuition.
1.3 Structure of optimal trajectories
Already implied during the derivation of HJB, relations between neighbouring trajectories provide fruitful information to understand optimal-control problems. In this section, opposing to § 1.2 where we perturbed control input and capture the consequences implicitly by ‘J(x+Δx)’, let’s directly study the connections between adjacent neighbours.
1.3.1 Dynamics of a bunch
Illustrated below, we denote one of the optimal trajectories as the “Reference trajectory” and try to find the relations to those “Bunch of optimal neighbours”.
Since both reference and any neighbour are optimal, we have:
1.3.2 Input constraint
To provide a simple argument, we assumed a convex input constraint and a differentiable Hamiltonian and have:
This shows that terms associated with input deviation in § 1.3 must be zero, which can be illustrated over input space as:
1.4 The Pontryagin Maximum Principle
Returning to § 1.3, notice that the last equation describes the time-evolution of state-sensitivity on ‘J’. By denoting ‘∂ₓ J’ as ‘p(x,t)’ , we have:
A step further, since each optimal trajectory has an associated function ‘p*(t) = p(x(t),t)’, we claim PMP, a necessary condition for trajectories to be optimal.
1.5 Other approaches and variants
There are multiple ways to reach similar or more generalized results. You may refer to [6] which utilizes the Lagrange multiplier function. Furthermore, insightful in its way, the discrete-time analogous of PMP can be found in [7].
2. Examples and extensions
Staring at equations can be very desolated and hard to enjoy. Here we provide two toy problems which are simple, yet interesting enough to reveal the beauty and challenges of optimal control problems.
2.1 A strange Snell’s law
Inspired by, but not identical to Snell’s law in Physical systems, we study the following minimum time problem in a 2D plane:
2.1.1 Solve with PMP
Flexing our new skills, we have a necessary condition:
Except for the transition boundary, we have a constant co-state function ‘p*’ which corresponds to a family of straight trajectories driven by ‘u*’, which is also constant within homogeneous media.
For the abnormality across ‘x₂ = 0’, we first observed that the integration of ‘p*’ over space is finite, where the “impact” on ‘p*’ over the boundary can be deducted as:
Alternatively, if we follow the interpretation that ‘p*’ is the state sensitivity of ‘J’ (refer to § 1.4), the norm of ‘p*’ below is trivial without derivation as ‘n/c₀’. The rest follows that ‘p₁’ is constant across the boundary.
As suggested by PMP, we can illustrate the back-calculated ‘J’ from those “candidates of optimal trajectories”. (Some reverse trajectories stopped on the boundary as ‘θ’ below the horizon does not exist)
2.1.2 Revisit with HJB
Not hard to guess, as PMP is not sufficient to claim optimality, we expect some trajectories solved above were not optimal. On the other hand, HJB does not suffer from this limitation, as ‘J’ shall be uniquely defined over space-time by the Principle of Optimality. (Though ‘u’ may not)
Similar to the derivation in § 2.1.1, we have HJB:
Interestingly, viewed as a propagator to all directions with the weighting of ‘n/c₀’, ‘J’ can be solved by Huygens–Fresnel principle [8] with some modification. Instead of phasor interfering everywhere, we iterate the “wave-front” unidirectionally (which is more complicated to construct from a coding point of view). The process can be animated as:
Immediately, we noticed some blind spots that were not captured previously, trajectories corresponding to those were highlighted as red traces in the following diagram. Though we didn’t discover such a family with PMP in the last section, one can verify that those traces still satisfy PMP.
2.2 Minimum time trajectories of 1-D point-mass
For our second example, we revisit a common go-to example to introduce PMP (e.g. [5]) but forcing our self to find the solution with HJB (not because it generates a clickbait thumbnail). Starting from the problem:
We have the corresponding HJB follows as: (notice that the problem is time-transverse invariant)
Solving such non-linear PDE analytically can be quite a torture, but with a naïve partitioning technique:
Finally, by rejecting non-optimal solutions, the cost-to-go function can be visualized as below. Again emphasises that in general, ‘J’ need not be differentiable everywhere and often not trivial to solve, at least analytically.
Similar to the example given in § 2.1, we can solve the cost-to-go function ‘J’ by simulating the “propagation” as the animation below. Here we didn’t actually “solve” it, but simply emulated the process from analytical results.
References
[1] Stochastic Optimal Control via Path Integrals | Medium
[2] R. Bellman, R. Corporation, en K. M. R. Collection, Dynamic Programming. Princeton University Press, 1957.
[3] B. Houchmandzadeh, “The Hamilton-Jacobi Equation : an intuitive approach.”, American Journal of Physics, vol 88, no 5, bll 353–359, Apr 2020.
[4] Naidu, D. Subbaram. Optimal Control Systems. Ukraine, CRC Press, 2018.
[5] S. H. Żak en S. H. Żak, Systems and Control. Oxford University Press, 2003.
[6] Optimization Techniques with Applications to Aerospace Systems. United Kingdom: Academic Press, 1962.
[7] Dimitri P. Bertsekas, “Dynamic Programming and Optimal Control”, Athena Scientific , 2000.
[8] Hecht, Eugene. Optics. United Kingdom, Pearson, 2014.