Create sparse cross table in R from lists -
i can't reproduce sample, here problem.
i have large list object (1.1gb, ~ 3 million elements). looks not dissimilar this:
> head(xx, n = 3) [[1]] [1] "start" [2] "a|b|c" [3] "c|c|b" [4] "lose" [[2]] [1] "start" [2] "b|null|null" [3] "lose" [[3]] [1] "start" [2] "c|null|null" [3] "win"
what want count number of transitions between each step within nested list, i.e. how start goes c|null|null, how c|null|null goes win, across massive list.
on small subsample, can use following (where placeholder offsets lists one):
transition <- table(from=unlist(lapply(xx, append, 'placeholder', 0l)), to=unlist(mapply(c, xx, 'placeholder')))
which creates large contingency table object, of table populated zeroes. however, on real-world data, object exceeds 2gb , fails unable create object memory error.
on small subsample again, revert cross table data.frame() object coerces cross table 3 column table (from, to, freq), , can manually delete 0 entries along placeholder.
my question is: there way achieve "sparse" data frame counts real transitions skips creating huge zero-padded cross table?
please let me know if need more information , try provide!
solved myself in different way using data.table speed:
sequence <- unlist(xx) transition <- data.table( = head(sequence, -1l), = tail(sequence, -1l)) transition.count <- transition[, .n, = c('from', 'to')]
Comments
Post a Comment