scala - How to print last n lines of a dstream in spark streaming? -


spark streaming dstream print() displays first 10 lines
val filedstream = ssc.textfilestream("hdfs://localhost:9000/abc.txt") filedstream.print()
there way last n lines considering text file large in size , unsorted ?

if this, simplify to:

filedstream.foreachrdd { rdd =>       rdd.collect().last     } 

however, has problem of collecting data driver.

is data sorted? if so, reverse sort , take first. alternatively, hackey implementation might involve mappartitionswithindex returns empty iterator partitions except last. last partition, filter elements except last element in iterator. should leave 1 element, last element.

or can try

filedstream.foreachrdd { rdd =>   rdd.top(10)(reverseordering) } 

Comments

Popular posts from this blog

html - How to set bootstrap input responsive width? -

javascript - Highchart x and y axes data from json -

javascript - Get js console.log as python variable in QWebView pyqt -