Kernel Model
In the linear-in-parameter model,
basis functions are fixed to, e.g., polynomial functions or sinusoidal functions without regard to training samples
$\{(x_i, y_i)\}_{i=1}^n$. Whereas the kernel model is introduced, which uses training input samples $\{x_i\}_{i=1}^n$ for basis function design
Let us consider a bivariate function called the kernel function $K(\cdot,\cdot)$.
The kernel model is defined as the linear combination of $\{K(x, x_j)\}_{j=1}^n$
\[
f_\theta(x) = \sum_{j=1}^{n} \theta_j K(x, x_j)
\]
As a kernel function, the Gaussian kernel would be the most popular choice
\[
K(x, c) = \exp \left( -\frac{\|x - c\|^2}{2h^2} \right),
\]
where $\|\cdot\|$ denotes the $\ell_2$-norm
\[
\|x\| = \sqrt{x^\top x}.
\]
$h$ and $c$ are, respectively, called the Gaussian bandwidth and the Gaussian center
◀
▶