java - Union of sets in spark querying cassandra -


the table structure in cassandra:

identifier, date, set(integer) 

what want achieve using spark grouping rows identifier , date, , aggregating sets value. more clear example:

raw data: (consider letters representing integers)

id1, 05-05-2017, {a,b,c} id1, 05-05-2017, {c,d} id1, 26-05-2017, {a,b,c} id1, 26-05-2017, {b,c} id2, 26-05-2017, {a,b,c} id2, 26-05-2017, {b,c,d} 

output:

id1, 05-05-2017, {a,b,c,d} id1, 26-05-2017, {a,b,c} id2, 26-05-2017, {a,b,c,d} 

since set, want unique values in aggregated results. using java , dataset.

if dataframe has columns mentions can this:

df.withcolumn("set", explode(col("set"))).groupby("identifier", "date").agg(collect_set("set")) 

Comments

Popular posts from this blog

html - How to set bootstrap input responsive width? -

javascript - Highchart x and y axes data from json -

javascript - Get js console.log as python variable in QWebView pyqt -