java - Union of sets in spark querying cassandra -
the table structure in cassandra:
identifier, date, set(integer)
what want achieve using spark grouping rows identifier , date, , aggregating sets value. more clear example:
raw data: (consider letters representing integers)
id1, 05-05-2017, {a,b,c} id1, 05-05-2017, {c,d} id1, 26-05-2017, {a,b,c} id1, 26-05-2017, {b,c} id2, 26-05-2017, {a,b,c} id2, 26-05-2017, {b,c,d}
output:
id1, 05-05-2017, {a,b,c,d} id1, 26-05-2017, {a,b,c} id2, 26-05-2017, {a,b,c,d}
since set, want unique values in aggregated results. using java , dataset.
if dataframe has columns mentions can this:
df.withcolumn("set", explode(col("set"))).groupby("identifier", "date").agg(collect_set("set"))
Comments
Post a Comment