From Hamilton-Jacobi-Bellman to Pontryagin’s Maximum Principle

Connections and implications

7 min readSep 19, 2024

Closely related to each other, the Hamilton-Jacobi-Bellman equation and Pontryagin’s Maximum Principle were two of the most important pillars in Optimal control theory. In this article, we provide a pathway to derive PMP from HJB under some mild assumptions, showing the connections in between, and most importantly, revealing the beautiful structure of optimal control problems. Inspired by Huygens–Fresnel principle, two toy examples were also provided, each tackled from different aspects to demonstrate potential challenges for each method.

1. HJB in continuous system

Let’s start by reviewing the HJB equation and introduce the optimal-cost-to-go function, which plays a central role in optimal problems with causal or chain-liked structures.

1.1 Optimal Control Problem

Given dynamic system ‘f’, the optimal control problem aims to find trajectories of state ‘x(t)’ and input ‘u(t)’ which minimizes the functional defined by running ‘L’ and termination ‘ϕ’ cost. For those attained optimum, we denote the corresponding value as ‘J’, the optimal cost-to-go.

A typical setup of optimal control problem with a fixed period [to, T].

1.2 Principle of Optimality

Following the discussion made in the previous post [1], we have ‘J’ as a function of the initial condition ‘x₀, t₀’, which can be extended over any other space-time provided it reaches the domain of ‘ϕ’ at time ‘T’. In essence, this creates neighbour problems, covering the original one which corresponds to ‘J(x₀, t₀)’. Since optimal input drives the overall lowest immediate and future costs, we have the following derivation that leads to the celebrated HJB equation [2].

Simplified derivation of HJB, here we take into account that **‘J’** may not be smooth and even discontinuous.

With optimum input ‘u’ attained and applied everywhere in space-time, HJB provides a propagation rule of ‘J’ in PDE form. Except for those cases with discontinuous control trajectories (unfortunately, this is quite common, see § 2), we can neglect the directional derivative and simplify HJB as follows.

Standard HJB with boundary condition given as termination cost **‘Φ’**, the definition of **‘H’** is usually referred to as the control Hamiltonian [3, 4]

Before we move on to the next section, instead of boring equations, one may skip to § 2 to gain a more tangible intuition.

1.3 Structure of optimal trajectories

Already implied during the derivation of HJB, relations between neighbouring trajectories provide fruitful information to understand optimal-control problems. In this section, opposing to § 1.2 where we perturbed control input and capture the consequences implicitly by ‘J(x+Δx)’, let’s directly study the connections between adjacent neighbours.

1.3.1 Dynamics of a bunch

Illustrated below, we denote one of the optimal trajectories as the “Reference trajectory” and try to find the relations to those “Bunch of optimal neighbours”.

Illustration on relations between reference and one of the neighbours, yellow shaded area indicates the collection of feasible neighbours.

Since both reference and any neighbour are optimal, we have:

1.3.2 Input constraint

To provide a simple argument, we assumed a convex input constraint and a differentiable Hamiltonian and have:

Note that we must have some feasible neighbour blablabla

This shows that terms associated with input deviation in § 1.3 must be zero, which can be illustrated over input space as:

1.4 The Pontryagin Maximum Principle

Returning to § 1.3, notice that the last equation describes the time-evolution of state-sensitivity on ‘J’. By denoting ‘∂ₓ J’ as ‘p(x,t)’ , we have:

A step further, since each optimal trajectory has an associated function ‘p*(t) = p(x(t),t)’, we claim PMP, a necessary condition for trajectories to be optimal.

Each optimal trajectory must have an associated function ‘p*’, commonly referred to as the ***costate*** [5] or **adjoint** [6] variable.

1.5 Other approaches and variants

There are multiple ways to reach similar or more generalized results. You may refer to [6] which utilizes the Lagrange multiplier function. Furthermore, insightful in its way, the discrete-time analogous of PMP can be found in [7].

2. Examples and extensions

Staring at equations can be very desolated and hard to enjoy. Here we provide two toy problems which are simple, yet interesting enough to reveal the beauty and challenges of optimal control problems.

2.1 A strange Snell’s law

Inspired by, but not identical to Snell’s law in Physical systems, we study the following minimum time problem in a 2D plane:

The minimum time problem of massless particle, ’n’ could be interpreted as the refractive index in the lower half

2.1.1 Solve with PMP

Flexing our new skills, we have a necessary condition:

The PMP condition, where ‘δ’ denotes the Dirac delta function

Except for the transition boundary, we have a constant co-state function ‘p*’ which corresponds to a family of straight trajectories driven by ‘u*’, which is also constant within homogeneous media.

A simple illustration of PMP in uniform media, the terminal condition for ‘**p*’** was deducted by taking the limit towards **‘z’**

For the abnormality across ‘x₂ = 0’, we first observed that the integration of ‘p*’ over space is finite, where the “impact” on ‘p*’ over the boundary can be deducted as:

Here we assume ‘θ’ always exists, this is true if **n≥n₀***, but not necessary otherwise.*

Alternatively, if we follow the interpretation that ‘p*’ is the state sensitivity of ‘J’ (refer to § 1.4), the norm of ‘p*’ below is trivial without derivation as ‘n/c₀’. The rest follows that ‘p₁’ is constant across the boundary.

As suggested by PMP, we can illustrate the back-calculated ‘J’ from those “candidates of optimal trajectories”. (Some reverse trajectories stopped on the boundary as ‘θ’ below the horizon does not exist)

Minimum time solved by direct integration of PMP, black dashed line are some candidate trajectories

2.1.2 Revisit with HJB

Not hard to guess, as PMP is not sufficient to claim optimality, we expect some trajectories solved above were not optimal. On the other hand, HJB does not suffer from this limitation, as ‘J’ shall be uniquely defined over space-time by the Principle of Optimality. (Though ‘u’ may not)

Similar to the derivation in § 2.1.1, we have HJB:

Interestingly, viewed as a propagator to all directions with the weighting of ‘n/c₀’, ‘J’ can be solved by Huygens–Fresnel principle [8] with some modification. Instead of phasor interfering everywhere, we iterate the “wave-front” unidirectionally (which is more complicated to construct from a coding point of view). The process can be animated as:

HJB solved by modified Huygens–Fresnel principle in reversed time

Immediately, we noticed some blind spots that were not captured previously, trajectories corresponding to those were highlighted as red traces in the following diagram. Though we didn’t discover such a family with PMP in the last section, one can verify that those traces still satisfy PMP.

2.2 Minimum time trajectories of 1-D point-mass

For our second example, we revisit a common go-to example to introduce PMP (e.g. [5]) but forcing our self to find the solution with HJB (not because it generates a clickbait thumbnail). Starting from the problem:

We have the corresponding HJB follows as: (notice that the problem is time-transverse invariant)

Solving such non-linear PDE analytically can be quite a torture, but with a naïve partitioning technique:

Partitioned solutions, some may not be optimal and were shown as shaded surfaces in the diagram below.

Finally, by rejecting non-optimal solutions, the cost-to-go function can be visualized as below. Again emphasises that in general, ‘J’ need not be differentiable everywhere and often not trivial to solve, at least analytically.

Analytic solution of cost-to-go function for 1-D point mass problem, notice that directional derivative is necessary to handle some region.

Similar to the example given in § 2.1, we can solve the cost-to-go function ‘J’ by simulating the “propagation” as the animation below. Here we didn’t actually “solve” it, but simply emulated the process from analytical results.

Emulated progress of HJB solved by time-reversed propagation

References

[1] Stochastic Optimal Control via Path Integrals | Medium

[2] R. Bellman, R. Corporation, en K. M. R. Collection, Dynamic Programming. Princeton University Press, 1957.

[3] B. Houchmandzadeh, “The Hamilton-Jacobi Equation : an intuitive approach.”, American Journal of Physics, vol 88, no 5, bll 353–359, Apr 2020.

[4] Naidu, D. Subbaram. Optimal Control Systems. Ukraine, CRC Press, 2018.

[5] S. H. Żak en S. H. Żak, Systems and Control. Oxford University Press, 2003.

[6] Optimization Techniques with Applications to Aerospace Systems. United Kingdom: Academic Press, 1962.

[7] Dimitri P. Bertsekas, “Dynamic Programming and Optimal Control”, Athena Scientific , 2000.

[8] Hecht, Eugene. Optics. United Kingdom, Pearson, 2014.