scala - How to print last n lines of a dstream in spark streaming? -


spark streaming dstream print() displays first 10 lines
val filedstream = ssc.textfilestream("hdfs://localhost:9000/abc.txt") filedstream.print()
there way last n lines considering text file large in size , unsorted ?

if this, simplify to:

filedstream.foreachrdd { rdd =>       rdd.collect().last     } 

however, has problem of collecting data driver.

is data sorted? if so, reverse sort , take first. alternatively, hackey implementation might involve mappartitionswithindex returns empty iterator partitions except last. last partition, filter elements except last element in iterator. should leave 1 element, last element.

or can try

filedstream.foreachrdd { rdd =>   rdd.top(10)(reverseordering) } 

Comments

Popular posts from this blog

networking - Vagrant-provisioned VirtualBox VM is not reachable from Ubuntu host -

c# - ASP.NET Core - There is already an object named 'AspNetRoles' in the database -

ruby on rails - ArgumentError: Missing host to link to! Please provide the :host parameter, set default_url_options[:host], or set :only_path to true -