March, 2018

Visualization of K-nearest neighborhood method

Regression

Consider a simple linear regression model as
\(y_i = 3 + x_i^2 + \epsilon_i\) (\(\epsilon_i \sim_{iid} (0,1)\))

The dataset \((y_i, x_i)\) \((i = 1, \cdots, 100)\) is generated by R:

    set.seed(1)
    x <- sort(rnorm(100))
    y<- 3+x^2 + rnorm(100)
    plot(x, y, pch = 20)

Regression

Regression

  • We ca fit the regression line(model) by R
  • The fitting is implemented by lm() function.
fit = lm(y~x)
fit$coefficient
## (Intercept)           x 
##   3.7565367   0.1488341
  • The result (outputs) of lm() is restored in the variable (list object) fit.
  • fit has a nod with name of coefficient.
  • fit$coefficient

Regression

plot(x, y, pch=20)
abline(a=fit$coefficients[1], b=fit$coefficients[2], col='blue')
yTrueMean = 3+x^2
lines(x, yTrueMean , lty=2, col='black')

Regression

  • k-nearest algorithm
\(\hat y(x) = \frac{1}{k}\sum_{i \in N_k(x)} y_i\)
  • \(\hat y(x)\): estimates of \(y\) for a given \(x\)

  • \(N_k(x)\): the index set of \(x_i\) that are k-nearest to \(x\).

  • \(N_k(x)\) is computed by knnx.indexin FNN library.

Regression

library(FNN)
knnx.index(x, 0, k = 10)
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]   47   46   48   45   44   43   42   41   49    50
idx <- c(knnx.index(x, 0, k = 10))
idx
##  [1] 47 46 48 45 44 43 42 41 49 50

Regression

  • \(x = 0\)
yhat <- mean( y[idx] ) 
yhat 
## [1] 2.446989

Visualization of KNN

eval.point = 0
plot(x, y, pch = 20)
abline( v = 0, col = 'black')
idx<- c( knnx.index(x, eval.point, k = 10) )
points( x[idx], y[idx], col = 'red', pch = 20)
abline(h = mean(y[idx]), lty = 2, col = 'red')
mean(y[idx])

Visualization of KNN

## [1] 2.446989

Visualization of KNN

  • Display \(\hat y\) on \(-1\) and \(1\).

Visualization of KNN

  • Using for statment we can display \(\hat y\) on [-3,3].
eval.n = 100
eval.point = seq(-3,3, length= eval.n)
plot(x, y, pch = 20)
idx.mat<- knnx.index(x, eval.point , k = 10)
yhat = rep(0,eval.n)
for (i in 1:eval.n)   yhat[i]<-mean(y[idx.mat[i,]])
lines(eval.point , yhat, type= 'l', lty = 2, col = 'red')

Visualization of KNN

3D plotting

3d scatterplot

library(scatterplot3d)
attach(mtcars)
scatterplot3d(wt, disp, mpg, main="3D Scatterplot")

Spinning 3d scatterplot

library(rgl)
attach(mtcars)
plot3d(wt, disp, mpg, col="red", size=3)
  • assign colors by factor
mypal = c('blue', 'red', 'green')
class(mtcars$cyl)
factor(mtcars$cyl)
mypal[factor(mtcars$cyl)]
plot3d(wt, disp, mpg, col= a[factor(mtcars$cyl)], size=10)

Note that

c("A","B","C")[c(2,2,3,1)]
## [1] "B" "B" "C" "A"

Draw 3d a surface

library(rgl)
z = 2 * volcano        # Exaggerate the relief
x = 10 * (1:nrow(z))   # 10 meter spacing (S to N)
y = 10 * (1:ncol(z))   # 10 meter spacing (E to W)
## Don't draw the grid lines :  border = NA
par(bg = "slategray")
persp(x, y, z, theta = 135, phi = 30, col = "green3", scale = FALSE,
      ltheta = -120, shade = 0.75, border = NA, box = FALSE)
  • persp() draws the surfaces in the 3d dimensional space by heigh informaiton.

Spinning 3d a surface

persp3d(x, y, z, col = "green3")