scala - Spark dataframe select rows with at least one null or blank in any column of that row -
from 1 dataframe want create new dataframe @ least 1 value in of columns null or blank in spark 1.5 / scala.
i trying write generalize function create new dataframe. pass dataframe , list of columns , creates record.
thanks
sample data:
val df = seq((null, some(2)), (some("a"), some(4)), (some(""), some(5)), (some("b"), null)).todf("a", "b") df.show +----+----+ | a| b| +----+----+ |null| 2| | a| 4| | | 5| | b|null| +----+----+
you can construct condition as, assume blank means empty string here:
import org.apache.spark.sql.functions.col val cond = df.columns.map(x => col(x).isnull || col(x) === "").reduce(_ || _) df.filter(cond).show +----+----+ | a| b| +----+----+ |null| 2| | | 5| | b|null| +----+----+
Comments
Post a Comment