Study notes: Regressing laws of physics from data using Ockham’s Razor
A data-driven regressing of Lorenz dynamics with parsimony
Recap
In the previous note, we imitate Lorenz dynamics using a 4 layer Neural Network. Despite it working successfully in most cases, it is still prone to those that are far away from the training set. This is a common issue in almost all kinds of regression technic and is pretty much a form of overfitting. The figure below shows a clear example of how a higher dimension model fits training data well (first five points) but is worse in other regions. This prompts us to seek a regression method such as LASSO[1] that takes parsimony into consideration.
Problem formulation
We seek to identify the Lorentz dynamic from the collect simulation data. Unlike what we’ve done in the previous post which deployed a neural network and non-linear excitation as the basis, this time a grey-box liked library of models will be used instead.
Linear combination of non-linear models
Before continuing our journey, one should notice that the Lorentz dynamic is a linear combination of a few coupled terms between three of the state, and yet exhibit complex behavior, which is surprisingly common in the world of physics.
Following this clue, it makes sense to build a library of nonlinear basis to linearly and sparsely represent the dynamics. For instance, some reasonable basis for the Lorentz dynamic may be [1, z, x², x*z, x*y, sin(y)…]. This idea is an innovative extension of the Vandermonde matrix in polynomial regression.
Sparse Identification of Nonlinear Dynamics (SINDy)[2]
Now we are left with a regression problem of finding the best linear combination of the non-linear library to fit the experiment and note that a sparse one is always preferred. Such a problem can be formulated as the optimization setup shown in the following and uses a one-norm penalty on parameter vector (ξ) to prompt a sparse outcome.
Here we utilized a Sequentially Threshold Least-Square (STLS,[3]) to solve the preceding problem with the “sparsification knob (λ)” manually decided. The STLS is basically a standard Least-Square followed by a drop-off threshold of λ, the whole process will be repeated dozens of times.
To show some degree of generality, the library will be built upon all polynomial combinations up to the order of 3 (1, x, y, z, xx, xy, xz…zzy, zzz). The trajectories to “train” the regression is shown in the figure below for one’s reference, notice we deliberately choose an initial point near the origin.
Result without noise
In the first case where no noise was added, we can easily reach appealing results that effectively recover the hidden model behind the data as shown in the following chart. In this ideal case effectively zero noise was added, the tuning of λ is irrelevant to the result.
Adding some pepper to our sensor
In the second case, with some noise added and visualized in the figure below, the threshold value is now crucial. Readers may find this process similar to selecting an optimal threshold when truncating singular values of the data matrix under noise [4]. After playing along with the sparsification knob (λ) from 0.001 to 0.025, the result is listed out as shown.
Conclusion
Although it might sound a bit cheating to build a library that coincident with the actual model behind the scene. This approach comes out to be elegant and general to a large class of problems as long as the library has a sufficient class of basis. For example in [5] a surprisingly simple model can be extracted from high dimension data by SINDy.
Another significant advantage of sparse identification relative to neural network approaches is the potential robustness when dealing with data out of the training set, which is clear in this particular case.
Reference
[1] R. Tibshirani, “Regression Shrinkage and Selection via the Lasso”, Journal of the Royal Statistical Society. Series B (Methodological), vol 58, no 1, bll 267–288, 1996
[2] S. L. Brunton en J. N. Kutz, Data-driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge University Press, 2019.
[3] H. Schaeffer, G. Tran, en R. Ward, “Extracting sparse high-dimensional dynamics from limited data”, SIAM Journal on Applied Mathematics, vol 78, no 6, bll 3279–3295, 2018.
[4] M. Gavish and D. L. Donoho, “The Optimal Hard Threshold for Singular Values is $4/\sqrt {3}$ ,” in IEEE Transactions on Information Theory, vol. 60, no. 8, pp. 5040–5053, Aug. 2014, doi: 10.1109/TIT.2014.2323359.
[5] S. L. Brunton, J. L. Proctor, en J. N. Kutz, “Discovering governing equations from data by sparse identification of nonlinear dynamical systems”, Proceedings of the National Academy of Sciences, vol 113, no 15, bll 3932–3937, 2016.