r - xgbTree caret matrix or not? -


i running example following code:

v.ctrl <- traincontrol(method = "repeatedcv", repeats = 1,number = 3,                     summaryfunction = twoclasssummary,                    classprobs = true,                    allowparallel=t)    xgb.grid <- expand.grid(nrounds = 10000,                         eta = c(0.01,0.05,0.1),                         max_depth = c(2,4,6,8,10,14)) set.seed(45) xgb_tune <-train(target~.,                  data=train,                  method="xgbtree",                  trcontrol=cv.ctrl,                  tunegrid=xgb.grid,                  verbose=t,                  metric="logloss",                  nthread =3) 

the error simple:

error in train(target ~ ., data = train, method = "xgbtree", trcontrol = cv.ctrl, : unused arguments (data = train, method = "xgbtree", trcontrol = cv.ctrl, tunegrid = xgb.grid, verbose = t, metric = "logloss", nthread = 3)

my dataset structure(list(feature19 = c(0.58776, 0.40764, 0.4708, 0.67577, 0.41681, 0.5291, 0.33197, 0.24138, 0.49776, 0.58293), feature6 = c(0.48424, 0.48828, 0.58975, 0.33185, 0.6917, 0.53813, 0.76235, 0.7036, 0.33871, 0.51928), feature10 = c(0.61347, 0.65801, 0.69926, 0.23311, 0.8134, 0.55321, 0.72926, 0.663, 0.49206, 0.55531), feature20 = c(0.39615, 0.49085, 0.50274, 0.6038, 0.37487, 0.53582, 0.62004, 0.63819, 0.37858, 0.40478), feature7 = c(0.55901, 0.38715, 0.50705, 0.76004, 0.3207, 0.54697, 0.31014, 0.21932, 0.4831, 0.52253), feature4 = c(0.5379, 0.52526, 0.44264, 0.28974, 0.65142, 0.41382, 0.44205, 0.47272, 0.6303, 0.56405), feature16 = c(0.41849, 0.45628, 0.37617, 0.39334, 0.46727, 0.36297, 0.3054, 0.41256, 0.6302, 0.41892), feature2 = c(0.62194, 0.5555, 0.61301, 0.27452, 0.74148, 0.49785, 0.5215, 0.46492, 0.54834, 0.58106), feature21 = c(0.32122, 0.37679, 0.35889, 0.74368, 0.18306, 0.47027, 0.40567, 0.47801, 0.41617, 0.35244), feature12 = c(0.56532, 0.55707, 0.49138, 0.24911, 0.69341, 0.42176, 0.41445, 0.45535, 0.62379, 0.5523), target = c(1l, 0l, 0l, 1l, 0l, 0l, 0l, 1l, 1l, 1l)), .names = c("feature19", "feature6", "feature10", "feature20", "feature7", "feature4", "feature16", "feature2", "feature21", "feature12", "target"), row.names = c(na, 10l), class = "data.frame")

does know whether have reprocess data xgbtree? thx u!

i realize kind of noob when comes r/caret/machine learning, saw post after checking responses question , managed code working. hope more knowledgeable able answer questions, in meantime, here did.

first, inputted data set r , tried running code. believe may have typo in control function, missing "c" in "cv" may lead issues having unused arguments.

however, after resolved issue there multiple errors , warnings; one, using twoclasssummary specifying logloss (note syntax here, it's not logloss in case changes anything)...instead switched summaryfunction mnlog call logloss function properly, i've read twoclasssummary uses auc metric. also, replaced "target" variable in training set simple character variable, in case, "y" or "n". can download csv file here.

after, kept receiving error regarding tuning grid, stating missing tuning parameters xgboost methods can found in documentation caret (available models). added default values rest of parameters (most of 1). tuning grid used can found here.

my final code used train xgb model follows:

control = traincontrol(method = "repeatedcv", repeats = 1, number = 3,                     summaryfunction = mnlogloss,                    classprobs = true,                    allowparallel=t)  tune = train(x=set[,1:10], y=set[,11], method="xgbtree", trcontrol=control,  tunegrid = xgb.grid, verbose=true, metric="logloss", nthread=3) 

and output shown here:

tune extreme gradient boosting   10 samples 10 predictors 2 classes: 'n', 'y'   no pre-processing resampling: cross-validated (3 fold, repeated 1 times)  summary of sample sizes: 6, 8, 6  resampling results across tuning parameters:    eta   max_depth  logloss     0.01   2         0.6914816   0.01   4         0.6914816   0.01   6         0.6914816   0.01   8         0.6914816   0.01  10         0.6914816   0.01  14         0.6914816   0.05   2         0.6848399   0.05   4         0.6848399   0.05   6         0.6848399   0.05   8         0.6848399   0.05  10         0.6848399   0.05  14         0.6848399   0.10   2         0.6765847   0.10   4         0.6765847   0.10   6         0.6765847   0.10   8         0.6765847   0.10  10         0.6765847   0.10  14         0.6765847  tuning parameter 'nrounds' held constant @ value of 10000 tuning parameter 'gamma' held constant @  value of 0 tuning parameter 'colsample_bytree' held constant @ value of 1 tuning parameter  'min_child_weight' held constant @ value of 1 tuning parameter 'subsample' held constant @ value of 1 logloss used select optimal model using  smallest value. final values used model nrounds = 10000, max_depth = 2, eta  = 0.1, gamma = 0, colsample_bytree =  1, min_child_weight = 1 , subsample = 1. 

i hope helps, , seeking. bit suspicious if did log loss command correctly because appear max depth literally had no effect on log loss. reran model using different metric, auc, , results showed no effect regardless of changed, , same cohen's kappa. i'm guessing due ten samples, can explain did more code dump.


Comments

Popular posts from this blog

networking - Vagrant-provisioned VirtualBox VM is not reachable from Ubuntu host -

c# - ASP.NET Core - There is already an object named 'AspNetRoles' in the database -

ruby on rails - ArgumentError: Missing host to link to! Please provide the :host parameter, set default_url_options[:host], or set :only_path to true -