scala - Spark dataframe select rows with at least one null or blank in any column of that row -


from 1 dataframe want create new dataframe @ least 1 value in of columns null or blank in spark 1.5 / scala.

i trying write generalize function create new dataframe. pass dataframe , list of columns , creates record.

thanks

sample data:

val df = seq((null, some(2)), (some("a"), some(4)), (some(""), some(5)), (some("b"), null)).todf("a", "b")  df.show +----+----+ |   a|   b| +----+----+ |null|   2| |   a|   4| |    |   5| |   b|null| +----+----+   

you can construct condition as, assume blank means empty string here:

import org.apache.spark.sql.functions.col val cond = df.columns.map(x => col(x).isnull || col(x) === "").reduce(_ || _)  df.filter(cond).show +----+----+ |   a|   b| +----+----+ |null|   2| |    |   5| |   b|null| +----+----+ 

Comments

Popular posts from this blog

networking - Vagrant-provisioned VirtualBox VM is not reachable from Ubuntu host -

c# - ASP.NET Core - There is already an object named 'AspNetRoles' in the database -

ruby on rails - ArgumentError: Missing host to link to! Please provide the :host parameter, set default_url_options[:host], or set :only_path to true -