java - Union of sets in spark querying cassandra -


the table structure in cassandra:

identifier, date, set(integer) 

what want achieve using spark grouping rows identifier , date, , aggregating sets value. more clear example:

raw data: (consider letters representing integers)

id1, 05-05-2017, {a,b,c} id1, 05-05-2017, {c,d} id1, 26-05-2017, {a,b,c} id1, 26-05-2017, {b,c} id2, 26-05-2017, {a,b,c} id2, 26-05-2017, {b,c,d} 

output:

id1, 05-05-2017, {a,b,c,d} id1, 26-05-2017, {a,b,c} id2, 26-05-2017, {a,b,c,d} 

since set, want unique values in aggregated results. using java , dataset.

if dataframe has columns mentions can this:

df.withcolumn("set", explode(col("set"))).groupby("identifier", "date").agg(collect_set("set")) 

Comments

Popular posts from this blog

python - Best design pattern for collection of objects -

go - serving up pdfs using golang -

python - django admin: changing the way a field (w/ relationship to another model) is submitted on a form so that it can be submitted multiple times -