Penalized smoothing
In Functional data analysis, how can we choose the number of basis? Is there any possibility that generates overfitting?
Intuitively, if we take more basis and make a basis expansion, the result seems to follow overall function trend better than the case of using less basis.
Thus, some expert develop a strategy to choose the number of basis carefully. And we can use penalized smoothing method in this situation.
When we fit a basis expansion, it is a procedure finding coefficient value $c$. And basis expansion is done with least squares as follows
$$ S(\mathbf{c}_n) = \sum_j(Y_{jm} - \sum_k c_{nk} B_k (t_{nj}))^2 $$
n denotes each curve and j denotes each time point. Thus, it can be seen the differences between actual data $Y$ and basis expansion for each time point.
A penalized method introduce penalty term to above equation.
$$S_\lambda (\mathbf{c}_n) = \sum_j (Y_{jn} - \sum_k c_{nk} B_k(t_{nj}))^2 + \lambda \int _0^1 [L\tilde{X}_n)(t)]^2 \ dt \ \ \ \cdots \ \ \ (1)$$
This equation is somewhat similar to Ridge regression. And if $\lambda \rightarrow \infty $, the result is just a straight line while it is the same as OLS method when $\lambda \rightarrow 0$.
What is $L$?
$L$ means some specified linear differential operator. i.e., it is a linear combination of derivatives of some function. And the popular choice of $L(x)(t)$ for periodic data is as follows:
$$L(x)(t) = \frac{4\pi^2}{T^2} x^{(1)}(t) + x^{(3)}(t)$$
$x^{(1)}$ and $x^{(3)}$ each indicate first and third derivatives and this equation is called harmonic acceleration operator.
And we can write equation (1) as matrix form as follows:
$$S(\mathbf{c}_n) = (\mathbf{Y}_n - \mathbf{B}_n \mathbf{c}_n)^\top (\mathbf{Y}_n - \mathbf{B}_n \mathbf{c}_n) + \lambda \mathbf{c}_n^\top \mathbf{W} \mathbf{c}_n$$
Simply, we can derive closed form solution for this
$$ \hat{\mathbf{c}}_n = (\mathbf{B}_n^\top \mathbf{B}_n + \lambda \mathbf{W})^{-1} \mathbf{B}_n^\top \mathbf{Y}_n$$
And the fitted values are
$$ \hat{\mathbf{Y}}_n = \mathbf{B}_n \hat{\mathbf{c}}_n$$
How to choose $\lambda$?
Of course, it still remains some question. Then, how can we set $\lambda$? It seems to be generated another optimization problem.
To choose appropriate $\lambda$, it is helpful to use "degrees of freedom". It is defined as
$$df = trace(\mathbf{B}_n(\mathbf{B}_n^\top \mathbf{B}_n + \lambda \mathbf{W})^{-1} \mathbf{B}_n^\top)$$
If $\lambda = 0$ then $df=K$ trivially.
After that, there are several ways to choose $\lambda$. I will introduce several methods.
Firstly, let's define $RSS = (\mathbf{Y}_n - \hat{\mathbf{Y}}_n)^\top (\mathbf{Y}_n - \hat{\mathbf{Y}}_n)$.
First one is GCV and it is defined as follows:
$$GCV(\lambda) = \frac{J}{(J-df)^2}$$
Second one is AIC:
$$AIC(\lambda) = J \log{(J^{-1}RSS)}+2df$$
Third one is BIC:
$$BIC(\lambda) = J\log{(J^{-1}RSS)} + \log{(J)}df$$
And last one is Cross-validation method.
GCV is known as the most popular in FDA since it is easy to compute.
'통계공부 > Functional Data Analysis' 카테고리의 다른 글
[FDA] 5. Hilbert spaces (2) | 2024.04.02 |
---|---|
[FDA] 4. Curve alignment (2) | 2024.03.17 |
[FDA] 2. Derivatives of Functional Data (3) | 2024.03.14 |
[FDA] 1. Functional Principal Components Analysis (5) | 2024.03.05 |
[FDA] 0. What is Functional Data Analysis (FDA)? (0) | 2024.03.05 |