# Tikhonov Regularization and Residual Estimation, part II

The previous 2 posts have demonstrated the effects of Tikhonov Regularization on a polynomial regression algorithm. This post continues to explore Tikhonov Regularization's effect on out of sample residual and eigenvalue estimation. In order to provide a better analysis of the out-of-sample error, animated scatterplots were created of the eigenvalue estimates for a number of iterations of the learning algorithm, as the Tikhonov Regularization parameter, $\alpha$ swept from $0$ to $10^{-5}$ in steps of $10^{-6}$. This range for $\alpha$ was chosen after numerous trials suggested it could be a possible optimal range. In addition to trends in eigenvalue estimation, this experiment also plotted those in residual estimate error.

Tikhonov Regularization is one form of regularization that is commonly used in econometrics and other fields. Regularization in general is a way to reduce the chances of over-fitting a model by decreasing its VC dimension. The primary benefit of avoiding overfit is to reduce out-of-sample error in the performance of the learning algorithm. This gain is often found at the cost of decreased in-sample performance.

For a more rigorous explanation of this particular form of regularization, see this post, and for an exploration of the power-spectrum-domain effects of this regression on residual estimation, see this post. For more background on the polynomial regression algorithm being tested, see this post.

This experiment attempted to fix several shortcomings of the previous discussion of Tikhonov Regularization in two ways. First, it increased the sample size to 10 cases for each model in an attempt to ameliorate some of the effects of variance between test cases. Furthermore, it adjusted the range of the regularization parameter $\alpha$ to search for more optimal values.

# Methods

The experiment, whose source code is entirely available here, generated 6 models which consisted of 4-pole filters and white noise. Poles were chosen in complex conjugate pairs, and the minimum radii of each pole was swept from 0.499 to 0.999 in increments of 0.1 for each trial. Within each trial, $\alpha$ swept from $0 - 10^{-5}$ for a total of 11 tests, each of which consisting of 10 cases. For each of the 10 cases in each test, a unique noise vector was generated as the residual to be reconstructed from the dataset. The dataset vectors were 10000 samples long each.

The results, archived here (127.8 MB .zip), are shown below for each trial as animated polar scatterplots, showing eigenvalue estimates for all 10 cases (blue dots), plotted against the actual eigenvalues (red circles w/ length vectors), and different tests--corresponding to different values of $\alpha$, comprise each frame of the animations. In addition to these scatterplots, the residual estimation accuracy, once again defined in this case as $e_r = \|1 - cor( \hat{r} , r ) \|^2$, was plotted for each test.

# Trial 1:

Order: 4
Mics: 1
Winsize: 10000
$\Lambda$ :
(0.491486051079141,0.08917400853549813) (-0.4011848223574587,0.2983109966059466) (0.491486051079141,-0.08917400853549813) (-0.4011848223574587,-0.2983109966059466)

Above: eigenvalue estimates for 10 iterations of trial 1. Each of the 11 frames uses an increasing value of $\alpha$. Estimates are in blue and targets are in red.

Above: residual estimation error $e_r$ as a function of $\alpha$.

# Trial 2:

Order: 4
Mics: 1
Winsize: 10000
$\Lambda$ :
(-0.2975730609061847,0.5205948842114549) (0.1867640670164439,0.5693726759836714) (-0.2975730609061847,-0.5205948842114549) (0.1867640670164439,-0.5693726759836714)

Above: eigenvalue estimates for 10 iterations of trial 2. Each of the 11 frames uses an increasing value of $\alpha$. Estimates are in blue and targets are in red.

Above: residual estimation error $e_r$ as a function of $\alpha$.

# Trial 3:

Order: 4
Mics: 1
Winsize: 10000
$\Lambda$ :
(-0.5565329133881385,0.4242959633972246) (-0.5801774399366052,0.3916034216348853) (-0.5565329133881385,-0.4242959633972246) (-0.5801774399366052,-0.3916034216348853)

Above: eigenvalue estimates for 10 iterations of trial 3. Each of the 11 frames uses an increasing value of $\alpha$. Estimates are in blue and targets are in red.

Above: residual estimation error $e_r$ as a function of $\alpha$.

# Trial 4:

Order: 4
Mics: 1
Winsize: 10000
$\Lambda$ :
(-0.1527028913384089,0.7848095067868207) (0.6772093851636565,0.4244771948675268) (-0.1527028913384089,-0.7848095067868207) (0.6772093851636565,-0.4244771948675268)

Above: eigenvalue estimates for 10 iterations of trial 4. Each of the 11 frames uses an increasing value of $\alpha$. Estimates are in blue and targets are in red.

Above: residual estimation error $e_r$ as a function of $\alpha$.

# Trial 5:

Order: 4
Mics: 1
Winsize: 10000
$\Lambda$ :
(0.3872974842150309,0.8122763294810199) (-0.8456804527113718,0.3066288313181266) (0.3872974842150309,-0.8122763294810199) (-0.8456804527113718,-0.3066288313181266)

Above: eigenvalue estimates for 10 iterations of trial 5. Each of the 11 frames uses an increasing value of $\alpha$. Estimates are in blue and targets are in red.

Above: residual estimation error $e_r$ as a function of $\alpha$.

# Trial 6:

Order: 4
Mics: 1
Winsize: 10000
$\Lambda$ :
(-0.5739124096864732,0.8184095412155564) (0.1659348948186841,0.9853479917780656) (-0.5739124096864732,-0.8184095412155564) (0.1659348948186841,-0.9853479917780656)

Above: eigenvalue estimates for 10 iterations of trial 6. Each of the 11 frames uses an increasing value of $\alpha$. Estimates are in blue and targets are in red.

Above: residual estimation error $e_r$ as a function of $\alpha$.

# Conclusion:

These results, in particular the mean squared residual error data, show fairly unequivocally that Tikhonov Regularization is not an appropriate technique for the present application. The eigenvalue estimates barely show any change as a result of increasing $\alpha$ in this range, and the residual estimation is worsened in general. However, there are many more strategies available for avoiding overfit. Future work will explore other types of regularization which may be better suited to serve the present constraints.