dataframe - Choosing different amount of elements from each group in R -
i working on kaggle instacart competition, quite new r , have run can not figure out.
i have dataset 4 columns. first column order id (id1). second column product id (id2). third column probability want select product id2 order id1 can consider ranking, higher probability selected on smaller probability. finally, fourth column amount of products want select given order (a feature of order). example, have here first 12 rows of dataframe df:
id1 id2 prob num 1 17 13107 0.4756982 3 2 17 21463 0.3724126 3 3 17 38777 0.3534422 3 4 17 21709 0.3364623 3 5 17 47766 0.3364623 3 6 17 39275 0.3165896 3 7 34 16083 0.4093785 4 8 34 39475 0.3892882 4 9 34 47766 0.3892882 4 10 34 2596 0.3837562 4 11 34 21137 0.3762758 4 12 34 47792 0.3737032 4
we can see id1 = 17 want choose 3 elements, , id1 = 34 want choose 4 elements. result should be
id1 id2 17 13107, 21463, 38777 34 16083, 39475, 47766, 2596
or similar this.
at moment have tried using
df %>% group_by(id1) %>% top_n(n = num)
but error
selecting num error in is_scalar_integerish(n) : object 'num' not found
anyone know how go doing this?
thanks
you can pipe grouped data directly summarise
statement:
df %>% group_by(id1) %>% summarise(id2 = tostring(id2[seq_len(first(num))])) ## tibble: 2 x 2 # id1 id2 # <int> <chr> #1 17 13107, 21463, 38777 #2 34 16083, 39475, 47766, 2596
in statement, id2[seq_len(first(num))]
used extract first num
per group, create sequence 1 num
, sequence used subset first x id2
values.
the tostring
creates string per id1 group.
here's base r option using aggregate
:
aggregate(id2 ~ id1, fun=tostring, subset(df, ave(id1, id1, fun=seq_along) <= num)) # id1 id2 #1 17 13107, 21463, 38777 #2 34 16083, 39475, 47766, 2596
please note assumed data orderd (as in example) decreasing probability.
Comments
Post a Comment