r - xgbTree caret matrix or not? -
i running example following code:
v.ctrl <- traincontrol(method = "repeatedcv", repeats = 1,number = 3, summaryfunction = twoclasssummary, classprobs = true, allowparallel=t) xgb.grid <- expand.grid(nrounds = 10000, eta = c(0.01,0.05,0.1), max_depth = c(2,4,6,8,10,14)) set.seed(45) xgb_tune <-train(target~., data=train, method="xgbtree", trcontrol=cv.ctrl, tunegrid=xgb.grid, verbose=t, metric="logloss", nthread =3)
the error simple:
error in train(target ~ ., data = train, method = "xgbtree", trcontrol = cv.ctrl, : unused arguments (data = train, method = "xgbtree", trcontrol = cv.ctrl, tunegrid = xgb.grid, verbose = t, metric = "logloss", nthread = 3)
my dataset structure(list(feature19 = c(0.58776, 0.40764, 0.4708, 0.67577, 0.41681, 0.5291, 0.33197, 0.24138, 0.49776, 0.58293), feature6 = c(0.48424, 0.48828, 0.58975, 0.33185, 0.6917, 0.53813, 0.76235, 0.7036, 0.33871, 0.51928), feature10 = c(0.61347, 0.65801, 0.69926, 0.23311, 0.8134, 0.55321, 0.72926, 0.663, 0.49206, 0.55531), feature20 = c(0.39615, 0.49085, 0.50274, 0.6038, 0.37487, 0.53582, 0.62004, 0.63819, 0.37858, 0.40478), feature7 = c(0.55901, 0.38715, 0.50705, 0.76004, 0.3207, 0.54697, 0.31014, 0.21932, 0.4831, 0.52253), feature4 = c(0.5379, 0.52526, 0.44264, 0.28974, 0.65142, 0.41382, 0.44205, 0.47272, 0.6303, 0.56405), feature16 = c(0.41849, 0.45628, 0.37617, 0.39334, 0.46727, 0.36297, 0.3054, 0.41256, 0.6302, 0.41892), feature2 = c(0.62194, 0.5555, 0.61301, 0.27452, 0.74148, 0.49785, 0.5215, 0.46492, 0.54834, 0.58106), feature21 = c(0.32122, 0.37679, 0.35889, 0.74368, 0.18306, 0.47027, 0.40567, 0.47801, 0.41617, 0.35244), feature12 = c(0.56532, 0.55707, 0.49138, 0.24911, 0.69341, 0.42176, 0.41445, 0.45535, 0.62379, 0.5523), target = c(1l, 0l, 0l, 1l, 0l, 0l, 0l, 1l, 1l, 1l)), .names = c("feature19", "feature6", "feature10", "feature20", "feature7", "feature4", "feature16", "feature2", "feature21", "feature12", "target"), row.names = c(na, 10l), class = "data.frame")
does know whether have reprocess data xgbtree? thx u!
i realize kind of noob when comes r/caret/machine learning, saw post after checking responses question , managed code working. hope more knowledgeable able answer questions, in meantime, here did.
first, inputted data set r , tried running code. believe may have typo in control function, missing "c" in "cv" may lead issues having unused arguments.
however, after resolved issue there multiple errors , warnings; one, using twoclasssummary specifying logloss (note syntax here, it's not logloss in case changes anything)...instead switched summaryfunction mnlog call logloss function properly, i've read twoclasssummary uses auc metric. also, replaced "target" variable in training set simple character variable, in case, "y" or "n". can download csv file here.
after, kept receiving error regarding tuning grid, stating missing tuning parameters xgboost methods can found in documentation caret (available models). added default values rest of parameters (most of 1). tuning grid used can found here.
my final code used train xgb model follows:
control = traincontrol(method = "repeatedcv", repeats = 1, number = 3, summaryfunction = mnlogloss, classprobs = true, allowparallel=t) tune = train(x=set[,1:10], y=set[,11], method="xgbtree", trcontrol=control, tunegrid = xgb.grid, verbose=true, metric="logloss", nthread=3)
and output shown here:
tune extreme gradient boosting 10 samples 10 predictors 2 classes: 'n', 'y' no pre-processing resampling: cross-validated (3 fold, repeated 1 times) summary of sample sizes: 6, 8, 6 resampling results across tuning parameters: eta max_depth logloss 0.01 2 0.6914816 0.01 4 0.6914816 0.01 6 0.6914816 0.01 8 0.6914816 0.01 10 0.6914816 0.01 14 0.6914816 0.05 2 0.6848399 0.05 4 0.6848399 0.05 6 0.6848399 0.05 8 0.6848399 0.05 10 0.6848399 0.05 14 0.6848399 0.10 2 0.6765847 0.10 4 0.6765847 0.10 6 0.6765847 0.10 8 0.6765847 0.10 10 0.6765847 0.10 14 0.6765847 tuning parameter 'nrounds' held constant @ value of 10000 tuning parameter 'gamma' held constant @ value of 0 tuning parameter 'colsample_bytree' held constant @ value of 1 tuning parameter 'min_child_weight' held constant @ value of 1 tuning parameter 'subsample' held constant @ value of 1 logloss used select optimal model using smallest value. final values used model nrounds = 10000, max_depth = 2, eta = 0.1, gamma = 0, colsample_bytree = 1, min_child_weight = 1 , subsample = 1.
i hope helps, , seeking. bit suspicious if did log loss command correctly because appear max depth literally had no effect on log loss. reran model using different metric, auc, , results showed no effect regardless of changed, , same cohen's kappa. i'm guessing due ten samples, can explain did more code dump.
Comments
Post a Comment