A user wants to know why the following KNN R code gives different predictions for different seeds. This is strange as K<-5, and thus the majority is well defined. In addition, the floating numbers are not that small to fall under a precision of data problem.

715    Asked by FelicaLaplaca in Data Science , Asked on Dec 20, 2019
Answered by Felica Laplaca

library(class)

train <- rbind(

  c(0.0626015, 0.0530052, 0.0530052, 0.0496676, 0.0530052, 0.0626015),

  c(0.0565861, 0.0569546, 0.0569546, 0.0511377, 0.0569546, 0.0565861),

  c(0.0538332, 0.057786, 0.057786, 0.0506127, 0.057786, 0.0538332),

  c(0.059033, 0.0541484, 0.0541484, 0.0501926, 0.0541484, 0.059033),

  c(0.0587272, 0.0540445, 0.0540445, 0.0505076, 0.0540445, 0.0587272),

  c(0.0578095, 0.0564349, 0.0564349, 0.0505076, 0.0564349, 0.0578095)

)

trainLabels <- c(1,

                 1,

                 0,

                 0,

                 1,

                 0)

test <- c(0.1923241, 0.1734074, 0.1734074, 0.1647619, 0.1734074, 0.1923241)

K <- 5

set.seed(494139)

pred <- knn(train=train, test=test, cl = trainLabels, k=K)

message("predicted: ", pred, ", seed: ", seed)

# **predicted: 1**, seed: 494139

set.seed(5371)

pred <- knn(train=train, test=test, cl = trainLabels, k=K)

message("predicted: ", pred, ", seed: ", seed)

# **predicted: 0**, seed: 5371

When we import the knn function,it calls an underlying C function called VR_knn, which includes a step that introduces a small value epsilon or a ‘fuzz’ value. The parameter values in this case may be hitting up against that "fuzz" step.

In this case, rounding up the values upto 4 digits can give consistency.

library(class)

train <- rbind(

  c(0.0626015, 0.0530052, 0.0530052, 0.0496676, 0.0530052, 0.0626015),

  c(0.0565861, 0.0569546, 0.0569546, 0.0511377, 0.0569546, 0.0565861),

  c(0.0538332, 0.057786, 0.057786, 0.0506127, 0.057786, 0.0538332),

  c(0.059033, 0.0541484, 0.0541484, 0.0501926, 0.0541484, 0.059033),

  c(0.0587272, 0.0540445, 0.0540445, 0.0505076, 0.0540445, 0.0587272),

  c(0.0578095, 0.0564349, 0.0564349, 0.0505076, 0.0564349, 0.0578095)

)

trainLabels <- c(1,1,0,0,1,0)

test <- c(0.1923241, 0.1734074, 0.1734074, 0.1647619, 0.1734074, 0.1923241)

K <- 5

train <- round(train,4)

seed <- 494139

set.seed(seed)

pred <- knn(train=train, test=test, cl = trainLabels, k=K)

message("predicted: ", pred, ", seed: ", seed)

# predicted: 0, seed: 494139

seed <- 5371

set.seed(seed)

pred <- knn(train=train, test=test, cl = trainLabels, k=K)

message("predicted: ", pred, ", seed: ", seed)

# predicted: 0, seed: 5371



Your Answer

Interviews

Parent Categories