# Tikhonov Regularization and Residual Estimation

In this post, I explore the effects of Tikhonov regularization on out-of-sample residual estimation using polynomial regression, as part of a continued effort to validate a machine learning algorithm for modal analysis. The algorithm estimates spatially distributed modes in a signal, and uses this information to estimate the residual, or forcing function, of a physical model. Today's experiments explore the frequency-domain effects of the regularization parameter $\alpha$ on residual estimates.

As was noted previously, Tikhonov regularization is a popular technique for decreasing the likelihood that a model is overfit. However, as was also noted previously, this particular form of regularization is somewhat problematic for the estimation of the model parameters themselves, effectively increasing generalization at the cost of accuracy. In the Z-plane view, we found that occasionally the regularization changes the angles of the estimated eigenfrequencies $\Lambda$, in addition to its stated purpose of decreasing their lengths. Since the algorithm's performance is to be judged on both eigendecomposition and residual estimation, a thorough assessment of both tasks is required.

# Methods

The experiment, whose source code is available here, tested a total of 72 different datasets, originating from 6 different models. These datasets were constructed using the same constraints as in previous experiments, and consisted of one channel of data with 4 eigenfrequencies. Each of the 6 models exhibited eigenfrequencies of incrementally increasing lengths $\| \lambda \|$, corresponding to less damping. The entire set of figures can be downloaded here.

The eigenfrequencies and their estimates, $\lambda$ and $\hat{\lambda}$, respectively, are plotted on the Z-plane. Their angles have been emphasized to help with visual inspection of the data. Each of the 6 models has been compiled into an animation displaying the plots of the residuals and eigenfrequencies. The animation shows each test in the model, with $\alpha$ increasing. For a given frame $n$ , $\alpha_n=10^{n/2} - 1$. The target eigenfrequencies are plotted in red, and the estimated eigenfrequencies in blue.

To test out-of-sample performance, each test implements the same all-pole model with a unique target residual vector. The normalized power spectrum of the target residual vector is plotted in red against the normalized power spectrum of its estimation, in blue. The residual estimation error was defined as $e_r = \|1 - cor( \hat{r} , r ) \|^2$. Finally, the residual estimation error $e_r$ was plotted as a function of damping or $1 - \| \lambda \|$.

# Trial 1:

$\|\lambda\| = 0.504 \pm 0.005$

Above: target (red) and estimated (blue) model parameters, with $\alpha$ increasing as a function of time.

Above: $e_r$ as a function of $1 - \| \lambda \|$. Each datapoint corresponds to one frame of the trial's animation.

# Trial 2:

$\|\lambda\| = 0.604 \pm 0.005$

Above: target (red) and estimated (blue) model parameters, with $\alpha$ increasing as a function of time.

Above: $e_r$ as a function of $1 - \| \lambda \|$. Each datapoint corresponds to one frame of the trial's animation.

# Trial 3:

$\|\lambda\| = 0.704 \pm 0.005$

Above: target (red) and estimated (blue) model parameters, with $\alpha$ increasing as a function of time.

Above: $e_r$ as a function of $1 - \| \lambda \|$. Each datapoint corresponds to one frame of the trial's animation.

# Trial 4:

$\|\lambda\| = 0.804 \pm 0.005$

Above: target (red) and estimated (blue) model parameters, with $\alpha$ increasing as a function of time.

Above: $e_r$ as a function of $1 - \| \lambda \|$. Each datapoint corresponds to one frame of the trial's animation.

# Trial 5:

$\|\lambda\| = 0.904 \pm 0.005$

Above: target (red) and estimated (blue) model parameters, with $\alpha$ increasing as a function of time.

Above: $e_r$ as a function of $1 - \| \lambda \|$. Each datapoint corresponds to one frame of the trial's animation.

# Trial 6:

$\|\lambda\| = 0.9995 \pm 0.0005$

Above: target (red) and estimated (blue) model parameters, with $\alpha$ increasing as a function of time.

Above: $e_r$ as a function of $1 - \| \lambda \|$. Each datapoint corresponds to one frame of the trial's animation.

# Conclusion:

The results show varied success with Tikhonov regularization as a strategy for improving generalization. However, the data points in a number of interesting directions. First, most of the error curves have a slight dip at $n=2$. This test corresponded to $\alpha = 2.1623$. Perhaps the best values for $\alpha$ will be close to this point. In a final application, the value of $\alpha$ or any other regularization parameters would be selected during a validation step, which would require different data from the training set, in order to avoid overfit. Second, although Tikhonov regularization is well-loved by statisticians for its simplicity and speed, it is by no means the only strategy for regularization. For example, the author has begun designing and validating a regularization method of his own, which applies some of the lessons from previous experiments. Future work will certainly explore Tikhonov regularization in conjunction with these other methods.