Study notes: Regressing laws of physics from data using Ockham’s Razor

A data-driven regressing of Lorenz dynamics with parsimony

5 min readFeb 28, 2022

Simulated trajectories with initial condition scattered with a standard deviation of 30

Recap

In the previous note, we imitate Lorenz dynamics using a 4 layer Neural Network. Despite it working successfully in most cases, it is still prone to those that are far away from the training set. This is a common issue in almost all kinds of regression technic and is pretty much a form of overfitting. The figure below shows a clear example of how a higher dimension model fits training data well (first five points) but is worse in other regions. This prompts us to seek a regression method such as LASSO[1] that takes parsimony into consideration.

Polynomial regression of noisy data with first five points as the training set

Problem formulation

We seek to identify the Lorentz dynamic from the collect simulation data. Unlike what we’ve done in the previous post which deployed a neural network and non-linear excitation as the basis, this time a grey-box liked library of models will be used instead.

Lorentz dynamic described by differential equations [2], σ=10, ρ=28, β=2.67

Linear combination of non-linear models

Before continuing our journey, one should notice that the Lorentz dynamic is a linear combination of a few coupled terms between three of the state, and yet exhibit complex behavior, which is surprisingly common in the world of physics.

Dynamics approximate from a linear combination of library Θ [2]

Following this clue, it makes sense to build a library of nonlinear basis to linearly and sparsely represent the dynamics. For instance, some reasonable basis for the Lorentz dynamic may be [1, z, x², x*z, x*y, sin(y)…]. This idea is an innovative extension of the Vandermonde matrix in polynomial regression.

Extending Vandermonde matrix (left) to a nonlinear library Θ (right), the subscript n is the number of available data points

Sparse Identification of Nonlinear Dynamics (SINDy)[2]

Now we are left with a regression problem of finding the best linear combination of the non-linear library to fit the experiment and note that a sparse one is always preferred. Such a problem can be formulated as the optimization setup shown in the following and uses a one-norm penalty on parameter vector (ξ) to prompt a sparse outcome.

Least-Square regression with spares penalty (Ockham’s Razor) [2]

Here we utilized a Sequentially Threshold Least-Square (STLS,[3]) to solve the preceding problem with the “sparsification knob (λ)” manually decided. The STLS is basically a standard Least-Square followed by a drop-off threshold of λ, the whole process will be repeated dozens of times.

To show some degree of generality, the library will be built upon all polynomial combinations up to the order of 3 (1, x, y, z, xx, xy, xz…zzy, zzz). The trajectories to “train” the regression is shown in the figure below for one’s reference, notice we deliberately choose an initial point near the origin.

Reference trajectories (data-set) for SINDy to regress, initial condition at [-8,8,27], noise std=1

Result without noise

In the first case where no noise was added, we can easily reach appealing results that effectively recover the hidden model behind the data as shown in the following chart. In this ideal case effectively zero noise was added, the tuning of λ is irrelevant to the result.

SINDy sparse regression of Lorenz model according to the polynomial library and ideal sensor

Adding some pepper to our sensor

In the second case, with some noise added and visualized in the figure below, the threshold value is now crucial. Readers may find this process similar to selecting an optimal threshold when truncating singular values of the data matrix under noise [4]. After playing along with the sparsification knob (λ) from 0.001 to 0.025, the result is listed out as shown.

SINDy sparse regression of Lorenz model according to the polynomial library with sensor noise, λ=0.001 (left), λ=0.025 (right)

Conclusion

Although it might sound a bit cheating to build a library that coincident with the actual model behind the scene. This approach comes out to be elegant and general to a large class of problems as long as the library has a sufficient class of basis. For example in [5] a surprisingly simple model can be extracted from high dimension data by SINDy.

Depicting the process of regression from high dimension data by SINDy [5]

Another significant advantage of sparse identification relative to neural network approaches is the potential robustness when dealing with data out of the training set, which is clear in this particular case.

Reference

[1] R. Tibshirani, “Regression Shrinkage and Selection via the Lasso”, Journal of the Royal Statistical Society. Series B (Methodological), vol 58, no 1, bll 267–288, 1996

[2] S. L. Brunton en J. N. Kutz, Data-driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge University Press, 2019.

[3] H. Schaeffer, G. Tran, en R. Ward, “Extracting sparse high-dimensional dynamics from limited data”, SIAM Journal on Applied Mathematics, vol 78, no 6, bll 3279–3295, 2018.

[4] M. Gavish and D. L. Donoho, “The Optimal Hard Threshold for Singular Values is $4/\sqrt {3}$ ,” in IEEE Transactions on Information Theory, vol. 60, no. 8, pp. 5040–5053, Aug. 2014, doi: 10.1109/TIT.2014.2323359.

[5] S. L. Brunton, J. L. Proctor, en J. N. Kutz, “Discovering governing equations from data by sparse identification of nonlinear dynamical systems”, Proceedings of the National Academy of Sciences, vol 113, no 15, bll 3932–3937, 2016.