dataframe - Choosing different amount of elements from each group in R -

July 15, 2013

i working on kaggle instacart competition, quite new r , have run can not figure out.

i have dataset 4 columns. first column order id (id1). second column product id (id2). third column probability want select product id2 order id1 can consider ranking, higher probability selected on smaller probability. finally, fourth column amount of products want select given order (a feature of order). example, have here first 12 rows of dataframe df:

        id1        id2       prob       num 1        17      13107   0.4756982        3 2        17      21463   0.3724126        3 3        17      38777   0.3534422        3 4        17      21709   0.3364623        3 5        17      47766   0.3364623        3 6        17      39275   0.3165896        3 7        34      16083   0.4093785        4 8        34      39475   0.3892882        4 9        34      47766   0.3892882        4 10       34       2596   0.3837562        4 11       34      21137   0.3762758        4 12       34      47792   0.3737032        4

we can see id1 = 17 want choose 3 elements, , id1 = 34 want choose 4 elements. result should be

id1     id2  17     13107, 21463, 38777  34     16083, 39475, 47766, 2596

or similar this.

at moment have tried using

df %>% group_by(id1) %>% top_n(n = num)

but error

selecting num error in is_scalar_integerish(n) : object 'num' not found

anyone know how go doing this?

thanks

you can pipe grouped data directly summarise statement:

df %>% group_by(id1) %>% summarise(id2 = tostring(id2[seq_len(first(num))])) ## tibble: 2 x 2 #    id1                       id2 #  <int>                     <chr> #1    17       13107, 21463, 38777 #2    34 16083, 39475, 47766, 2596

in statement, id2[seq_len(first(num))] used extract first num per group, create sequence 1 num , sequence used subset first x id2 values.

the tostring creates string per id1 group.

here's base r option using aggregate:

aggregate(id2 ~ id1, fun=tostring, subset(df, ave(id1, id1, fun=seq_along) <= num)) #  id1                       id2 #1  17       13107, 21463, 38777 #2  34 16083, 39475, 47766, 2596

please note assumed data orderd (as in example) decreasing probability.

Search This Blog

TY

dataframe - Choosing different amount of elements from each group in R -

Comments

Post a Comment

Popular posts from this blog

html - How to set bootstrap input responsive width? -

javascript - Highchart x and y axes data from json -

javascript - Get js console.log as python variable in QWebView pyqt -