본문 바로가기

optimization

[FDA] 3. Penalized smoothing

2024. 3. 15.

Penalized smoothing

In Functional data analysis, how can we choose the number of basis? Is there any possibility that generates overfitting?

Intuitively, if we take more basis and make a basis expansion, the result seems to follow overall function trend better than the case of using less basis.

 

Thus, some expert develop a strategy to choose the number of basis carefully. And we can use penalized smoothing method in this situation.

 

When we fit a basis expansion, it is a procedure finding coefficient value $c$. And basis expansion is done with least squares as follows

$$ S(\mathbf{c}_n) = \sum_j(Y_{jm} - \sum_k c_{nk} B_k (t_{nj}))^2 $$

n denotes each curve and j denotes each time point. Thus, it can be seen the differences between actual data $Y$ and basis expansion for each time point.

 

A penalized method introduce penalty term to above equation.

$$S_\lambda (\mathbf{c}_n) = \sum_j (Y_{jn} - \sum_k c_{nk} B_k(t_{nj}))^2 + \lambda \int _0^1 [L\tilde{X}_n)(t)]^2 \ dt \ \ \ \cdots \ \ \ (1)$$

This equation is somewhat similar to Ridge regression. And if $\lambda \rightarrow \infty $, the result is just a straight line while it is the same as OLS method when $\lambda \rightarrow 0$.

 

What is $L$?

$L$ means some specified linear differential operator. i.e., it is a linear combination of derivatives of some function. And the popular choice of $L(x)(t)$ for periodic data is as follows:

$$L(x)(t) = \frac{4\pi^2}{T^2} x^{(1)}(t) + x^{(3)}(t)$$

$x^{(1)}$ and $x^{(3)}$ each indicate first and third derivatives and this equation is called harmonic acceleration operator.

 

And we can write equation (1) as matrix form as follows:

$$S(\mathbf{c}_n) = (\mathbf{Y}_n - \mathbf{B}_n \mathbf{c}_n)^\top (\mathbf{Y}_n - \mathbf{B}_n \mathbf{c}_n) + \lambda \mathbf{c}_n^\top \mathbf{W} \mathbf{c}_n$$

Simply, we can derive closed form solution for this

$$ \hat{\mathbf{c}}_n = (\mathbf{B}_n^\top \mathbf{B}_n + \lambda \mathbf{W})^{-1} \mathbf{B}_n^\top \mathbf{Y}_n$$

And the fitted values are

$$ \hat{\mathbf{Y}}_n = \mathbf{B}_n \hat{\mathbf{c}}_n$$

 

 

How to choose $\lambda$?

Of course, it still remains some question. Then, how can we set $\lambda$? It seems to be generated another optimization problem.

To choose appropriate $\lambda$, it is helpful to use "degrees of freedom". It is defined as

$$df = trace(\mathbf{B}_n(\mathbf{B}_n^\top \mathbf{B}_n + \lambda \mathbf{W})^{-1} \mathbf{B}_n^\top)$$

 

If $\lambda = 0$ then $df=K$ trivially.

 

After that, there are several ways to choose $\lambda$. I will introduce several methods.

Firstly, let's define $RSS = (\mathbf{Y}_n - \hat{\mathbf{Y}}_n)^\top (\mathbf{Y}_n - \hat{\mathbf{Y}}_n)$.

 

First one is GCV and it is defined as follows:

$$GCV(\lambda) = \frac{J}{(J-df)^2}$$

 

Second one is AIC:

$$AIC(\lambda) = J \log{(J^{-1}RSS)}+2df$$

 

Third one is BIC:

$$BIC(\lambda) = J\log{(J^{-1}RSS)} + \log{(J)}df$$

 

And last one is Cross-validation method.

 

GCV is known as the most popular in FDA since it is easy to compute.

댓글

[FDA] 2. Derivatives of Functional Data

2024. 3. 14.

When we want to estimate data using FDA, it is useful to use derivative of function since derivative represents the shape of function.

And calculating the derivative is straightforward like following equation

$$X_n(t) = \sum_{k=1}^K c_{nk} B_k(t) \longrightarrow X_n^{'}(t) = \sum_k c_{nk} B_k^{'}(t)$$

 

However, there exists some case that function is not differentiable. Thus, we can use following approximation technique to derive the first derivative of the function.

$$X^{'}(t_i) \approx \frac{X(t_i) - X(t_{i-1})}{t_i - t_{i-1}}$$

 

After calculating its derivative, we plot this result. Then we can know that the function varies a lot if plot fluctuates many times.

 

These are plots of the results of differentiating the same function using different methods.

I used function below

$$ f(x) = sin(x) + log(x)$$

And the shape of this function is like this

function f(x) = sin(x) + log(x)

 

And the derivative of this function is this

$$ f^{'}(x) = cos(x) + \frac{1}{x}$$

 

exact equation and approximation results

Using this equation, I made two plots. Left one is the result of using true derivative and inputs the values between 0.1 to 15 by 0.05. Right one is the result of using above derivative approximation equation.

In conclusion, I think the shape of these two plots is quite similar. 

 

In FDA, there is some cases that the function is not differentiable. In that case, we can use approximation derivative formula effectively.

댓글