Data Science

[Paper] Efficient Large-scale Nonstationary Spatial Covariance Function Estimation Using Convolutional Neural Networks 2024.07.08 2
[FDA] 5. Hilbert spaces 2024.04.02 2

[Paper] Efficient Large-scale Nonstationary Spatial Covariance Function Estimation Using Convolutional Neural Networks

2024. 7. 8.

0. Overview

공간통계학에서 우리는 통계적 Modeling을 위해 보통 stationary 가정을 한다. 이는 간단히 말해서 특정 지점사이의 거리가 같다면, 공분산이 같음을 의미한다 (분산이 특정 지점 사이의 거리차이에 depend). 그러나 단순히 홍대입구역부터 신촌역까지의 거리와, 홍대입구역부터 합정역 까지의 거리의 같다는 사실이 각 지점의 특성이 같다는 것을 의미하지는 않는다. 예를 들어 강수량과 같은 수치는 단순한 거리 차이보다 각 지점의 온도나 고도 등에 더 많은 영향을 받는다.

따라서 이러한 공간적 특성을 반영하기 위해 modeling시에 nonstationary 가정을 해야하지만 이는 분석이 쉽지 않다. 이들은 Convolutional neural networks (CNN)을 사용하여 공분산을 modeling하였는데 이에 대해 살펴보도록 하겠다. 또한 CNN을 사용하기 위해 좌표 data를 어떻게 CNN의 input으로 변환하였는지를 중점적으로 살펴보도록 하겠다.

1. Methods

기존에 존재하는 nonstationary covariance modeling 방법은 1) spatial deformation method, 2) process convolution approach 등이다. 구체적인 설명은 생략하고 2)에서는 matern covariance의 closed form을 구할 수 있는데, 지역의 수가 많아질수록 계산해야하는 matrix의 크기가 커지며 따라서 추정하기 힘들다는 단점이 있다. 따라서 구역(subregion)을 나누어 local stationary를 가정하는 방법들이 후에 제안되었는데 이러한 방법은 subregion을 나누는 것이 주관적이라는 단점이 존재한다.

이들은 spatial data를 incomplete image로 여기고(image는 특정 image size에서 모든 점들이 관측되어 RGB의 값을 가지지만, spatial data는 관측이 되지 않는 지점이 존재하므로 이를 관측되지 않은 RGB로 생각) CNN을 사용하여 주관적으로 subregion을 나누는 기존의 방법을 data-driven 방법을 제안함으로서 (조금 더 객관적) 개선하고자 한다.

2. CNN model

2.1 CNN architecture

CNN model을 사용한 이유는 위에서 언급한 것과 같이 stationary 와 nonstationary region을 구분하기 위함이다. 따라서 100 by 100 크기의 input에 따라 구역이 nonstationary할 확률을 output한다. 따라서 분류문제로 생각하여 확률이 0.5보다 크거나 같으면 저자는 nonstationary 구역으로 생각하고 그렇지 않으면 stationary 구역으로 생각하여 모델을 학습하고 사용한다.

구체적으로 이들이 사용한 CNN 구조는 위와 같다. 우선 100 by 100 size의 input data를 3 by 3의 kernel 32개를 사용하여 feature를 extract하는 convolution layer를 거친다. 그 후 flatten layer로 이를 vector로 펼치고 마지막으로 FC layer에서 두 단계의 non-linear transformation을 거쳐 최종적으로 nonstationary region일 확률을 나타낸다.

2.2 Data pre-processing

그러나 문제점이 존재하는데 CNN을 사용하기 위해서 100 by 100 size의 input 형태로 data를 변환해야하지만 현실 세계의 많은 데이터는 100 by 100 grid에 맞게 균일하게 위치하지 않는다는 것이다. 따라서 이들은 3단계의 전처리 과정을 거쳐 100 by 100 size의 input형태를 만들어내며 구체적인 설명은 다음과 같다.

Splitting: divide observed region $[0,1]^2$ to 100 by 100 subregions (denote each subregion as $D_{i,j}$)
Averaging: $D_{i,j}$에 속하는 observation의 개수를 $N_{i,j}$, $D_{i,j}$에 속하는 observation들의 평균을 $\bar{Z}(i,j)$라고 하여 아래와 같이 계산한다.

$$\bar{Z}(i,j) = \begin{cases} \frac{1}{N_{i,j}} \sum_{k:s_k \in D_{i,j}} Z(s_k), \quad N_{i,j}>0 \\ \bar{Z}(u,v), \ \text{where }(u,v) = argmin_{u,v}\{(i-u)^2 + (j-v)^2 : N_{u,v}>0\}, \quad N_{i,j}=0 \end{cases}$$

Scaling: 위에서 얻은 $\bar{Z}(i,j) \in [0,1]$의 minimum을 m, maximum을 M이라고 하자. 아래와 같이 scaling을 한다.

$$\tilde{Z}(i,j) = \begin{cases} (\bar{Z}(i,j) - m) / (M-m), \ \ m \neq M; \\ 0.5, \quad \quad m=M \end{cases}$$

위 과정을 요약하자면, $[0,1]^2$의 좌표를 100 by 100의 subregion으로 나눈 후, 각 subregion에 속하는 점들의 GRF값의 평균을 취한 다음 minmax scaling을 한다. 결과적으로 우리가 원하는 100 by 100 size의 input 형태를 갖추게 된다. (그런데 location 좌표 $s_k$를 subregion $D_{i,j}$에 어떻게 mapping하는 지는 나와있지 않다)

2.3 Subregion selection

이제 남은 것은 train한 CNN과 전처리한 data를 사용하여 stationary한 지역과 nonstationary한 지역을 나누는 것이다. 이를 위해서 subregion을 나누어야 한다 (input size를 100 by 100으로 나누기 위해 2.2에서 정의한 subregion과 다름). 이들이 제안한 방법을 정리하면 다음과 같다.

좌표 $S = \{s_1,\dots,s_n\}$에서 K개의 random한 point $A = \{a_1, \dots, a_K\}$를 선택한다.
그 후 distance vector $E = \{ \lVert s_i - a_k \rVert^2: a_k \in A\}$를 계산한다. distance vector는 특정 point $s_i$에서 random하게 선택된 K개의 point $a_k$까지의 거리이며 따라서 성분이 K개이다.
특정 점 $s_i$가 $a_j$와 거리가 가장 최소라면, subregion $R_j$에 $(s_i, Z(s_i))$를 할당한다.
각 subregion$\{R_k\}_1^{K}$마다 2.1의 CNN 구조를 사용하여 nonstationary probability를 구한다 (let this $p_k, k=1,\cdots,K$).
$P = \sum_{k=1}^K p_k$를 구한 후 iteration을 반복하여 최소가 되는 $P$를 찾고 그때의 subregion에 속하는 점들의 centre를 반환한다 (각 iteration마다 선택되는 점의 집합 $A$가 다르므로 $P$도 달라진다).

요약하자면 몇 개의 subregion으로 나눌 것인지만 선택하면, 우리가 가진 데이터들이 어느 subregion에 속하는지 알 수 있는 알고리즘이다. 다만 최적의 subregion수 K를 선택하는 방법이 없다는 것은 조금 아쉽다.

3. Simulation

이들은 두 종류의 simulation을 하였다. 간략히 설명하자면 첫번째 실험은 stationary or nonstationary여부를 CNN이 잘 구분하는가에 관한 실험이다. 과정은 우선 stationary, nonstationary dataset을 각각 16000개 generate 한 후 80%의 sample을 활용해 CNN을 train한다. 구체적인 data generation과정은 생략하고, 남은 20%의 데이터를 활용해 test한 결과는 아래와 같다.

간혹 stationary data인 것을 nonstationary로 판단하는 경우(and 그 반대)가 있었지만 대체로 정확한 classification을 한 것을 알 수 있다.

두번째 실험은 $[0,1]^2$ domain에서 4개의 기준 점(anchor locations)을 기준으로 하여 4가지의 서로 다른 nonstationary parameter를 가지는 subregion에서 10,000개의 nonstationary data를 생성한다. 그 후 2&3개의 subregion을 가지도록 CNN을 학습한다. 여기서 실제로 나눈 subregion과 비슷하게 구역이 나뉘는지 check할 수 있으며, ExaGeoStat package를 사용하여 nonstationary data의 parameter또한 추정할 수 있는데, CNN model을 사용하여 추정한 parameter의 MSE vs. 기존 방식으로 subregion을 나눈 후 추정한 parameter의 MSE를 비교하였다.

추정 결과를 간단히 살펴보자면 CNN으로 subregion을 나눈 후 parameter를 추정하면 user-defined subregion보다 실제 parameter를 더 잘 추정하는 것을 알 수 있다.

4. 요약

이 논문은 공간통계학에서 stationary 가정이 위배되는 많은 상황이 있다는 것에서 출발하여, 기존의 local stationary 방법이 주관적인 선택에 의해 subregion을 만드는 것에 문제를 제기하고, CNN을 통해 subregion을 만드는 방법을 개선하고자 하였다. Stationary 가정을 local하게 하되, 객관적인 방법으로 subregion을 만든 후 subregion내의 point 사이의 local stationary를 가정하여 분석하는 방안을 제시한 것이다.

References:

Nag, P., Hong, Y., Abdulah, S., Qadir, G. A., Genton, M. G., & Sun, Y. (2023). Efficient Large-scale Nonstationary Spatial Covariance Function Estimation Using Convolutional Neural Networks. arXiv preprint arXiv:2306.11487.

저작자표시 비영리 변경금지 (새창열림)

'Paper > Statistics' 카테고리의 다른 글

[Paper] A Function Emulation Approach for DoublyIntractable Distributions (4)	2024.12.30
[Paper] SPEEDING UP MCMC BY DELAYED ACCEPTANCE AND DATASUBSAMPLING (5)	2024.12.27
[Paper] A variational neural Bayes framework for inference on intractable posterior distributions (3)	2024.09.03
[Paper] Spatio-temporal Diffusion Point Processes (1)	2024.07.10
[Paper] Interpretable Deep Generative Spatio-Temporal Point Processes (4)	2024.07.05

[FDA] 5. Hilbert spaces

2024. 4. 2.

Various spaces

Vector space

Since we handle various functions, not only a vector, we have to define functional spaces. Starting from vector space (I will skip the explanation for vector space), we can define normed vector space that can intuitively be explained as applying norm in vector space.

Banach space & Hilbert space

And the Banach space needs one more step. It is a complete normed vector space. Complete means every cauchy sequence in the space converges to a point in the space (detailed description is in Analysis course). Cauchy sequence means (a rough explanation) although we do not know what a point the sequence converges to, we know that the elements in that sequence are getting closer as sequence progresses.

Finally, we can reach Hilbert space. A Hilbert space, denoted $\mathcal{H}$, is a Banach space whose norm is defined via an inner product. It is important space since most properties in scalar or vector generalize well to Hilbert space (may not generalize in Banach space).

Example

Example 1) in $\mathbb{R}^d$, the $\mathcal{l}_p$ norm is given by

$$|x|^p_p = \sum_{i=1}^d x_i^p$$

But we cannot describe $\mathcal{l}_p$ norm as inner product unless p=2. Thus, $\mathbb{R}^d$ is only a Banach space unless p=2.

Example 2) the space of continuous functions over [0, 1], denoted by $\mathcal{C}[0,1]$, with sup-norm is a Banach space.

$$\lVert x \rVert=\sup_{0\le t \le 1} |x(t)|$$

However, since sup-norm is not an inner product norm, it is not a Hilbert space (we can easily show the existence of counter example using Parallelogram formula).

Theorems & Operators

Using these spaces and projection, we can build some theorem and operator.

Parceval's Theorem

Firstly, I will explain Parceval's Theorem that is useful to show some property. We need to know about seperable. We say that the space $\mathcal{H}$ is seperable if it contains a countable basis. It can be expressed as

$$x = \sum_{i=1}^\infty a_i e_i$$

And the Parceval's Theorem is

Let $e_i$ be an orthonormal basis of a real seperable Hilbert space, $\mathcal{H}$. Then for any $x \in \mathcal{H}$ we have
$$ x = \sum_{i=1}^\infty \langle x,e_i \rangle e_i \quad \text{and} \quad \lVert x \rVert ^2 = \sum_{i=1}^\infty \langle x,e_i \rangle ^2$$

Linear operators

Operator indicates the mapping from space to space. Then, the operator $L:\mathcal{H} \longrightarrow \mathcal{H}$ is called linear if $L(ax+by)=aL(x)+bL(y)$. And bounded linear operator satisfies

$$\lVert L \rVert_\mathcal{L} = \sup_{\lVert x \rVert \le 1} \lVert L(x) \rVert < \infty$$

Hilbert-Schmidt operators

A bounded linear operator, L, is called Hilbert-Schmidt if

$$\lVert L \rVert ^2 _\mathcal{S} = \sum_{i=1}^\infty \lVert L(e_i) \rVert ^2 < \infty$$

where $e_i$ is some orthonormal basis.

Thus, we can check whether the operator is bounded or Hilbert-Schmidt using the above 2 quantities. And it is known that Hilbert-Schmidt property is stronger than being bounded $(\mathcal{S} \subset \mathcal{L})$.

Self adjoint operators

What is the meaning of 'operator is symmetric'? It is unclear that means since it is space of function. Therefore, in Hilbert space, we can similarly define symmetric with self adjoint operator using the inner product.

We say $L^*$ is adjoint of an operator $L$ if
$$\langle L(x),y \rangle = \langle x, L^*(y) \rangle \ \text{for all}\ x,y \in \mathcal{H}$$
We say $L$ is self adjoint if $L=L^*$