scala - How to print last n lines of a dstream in spark streaming? -
spark streaming dstream print() displays first 10 lines
val filedstream = ssc.textfilestream("hdfs://localhost:9000/abc.txt")
filedstream.print()
there way last n
lines considering text file large in size , unsorted ?
if this, simplify to:
filedstream.foreachrdd { rdd => rdd.collect().last }
however, has problem of collecting data driver.
is data sorted? if so, reverse sort , take first. alternatively, hackey implementation might involve mappartitionswithindex returns empty iterator partitions except last. last partition, filter elements except last element in iterator. should leave 1 element, last element.
or can try
filedstream.foreachrdd { rdd => rdd.top(10)(reverseordering) }
Comments
Post a Comment