r - Creating a unique ID variable as combination of variables -


i have data frame (df) or data table (dt) with, let’s 1000 variables , 1000 observations. checked there no duplicates in observations, dt[!duplicated(dt)] has same length original file.

i create id variable observation combination of of 1000 variables have. differently other questions don’t know variables more suitable create id , need combination of, @ least, 3 or 4 variables.

is there package/function in r me efficient combination of variables create id variable? in real example struggling create id manually, , not best combination of variables.

example mtcars:

require(data.table) example <- data.table(mtcars) rownames(example) <- null # delete mtcars row names example <- example[!duplicated(example),] example[,id_var_wrong := paste0(mpg,"_",cyl)] length(unique(example$id_var_wrong)) # wrong id, there 27 different values variable despite 32 observations  example[,id_var_good := paste0(wt,"_",qsec)] length(unique(example$id_var_good)) # id there equal number of unique values different observations. 

is there function find wt , qsec automatically , not manually?

a homemade algorithm: principle greedily take variable distinct number of elements , filter remaining rows duplicates , iterate. doesn't give best solution it's easy way rather solution quickly.

set.seed(1) mat <- replicate(1000, sample(c(letters, letters), size = 100, replace = true))  library(dplyr)  columnsid <- function(mat) {   df <- df0 <- as_data_frame(mat)   vars <- c()   while(nrow(df) > 0) {     var_best <- names(which.max(lapply(df, n_distinct)))[[1]]     vars <- append(vars, var_best)     df <- group_by_at(df0, vars) %>% filter(n() > 1)   }   vars }  columnsid(mat) [1] "v68" "v32" 

Comments

Popular posts from this blog

html - How to set bootstrap input responsive width? -

javascript - Highchart x and y axes data from json -

javascript - Get js console.log as python variable in QWebView pyqt -