java - Union of sets in spark querying cassandra -


the table structure in cassandra:

identifier, date, set(integer) 

what want achieve using spark grouping rows identifier , date, , aggregating sets value. more clear example:

raw data: (consider letters representing integers)

id1, 05-05-2017, {a,b,c} id1, 05-05-2017, {c,d} id1, 26-05-2017, {a,b,c} id1, 26-05-2017, {b,c} id2, 26-05-2017, {a,b,c} id2, 26-05-2017, {b,c,d} 

output:

id1, 05-05-2017, {a,b,c,d} id1, 26-05-2017, {a,b,c} id2, 26-05-2017, {a,b,c,d} 

since set, want unique values in aggregated results. using java , dataset.

if dataframe has columns mentions can this:

df.withcolumn("set", explode(col("set"))).groupby("identifier", "date").agg(collect_set("set")) 

Comments

Popular posts from this blog

networking - Vagrant-provisioned VirtualBox VM is not reachable from Ubuntu host -

c# - ASP.NET Core - There is already an object named 'AspNetRoles' in the database -

ruby on rails - ArgumentError: Missing host to link to! Please provide the :host parameter, set default_url_options[:host], or set :only_path to true -