hadoop - Pig script doesn't work with MapReduce -


i'm trying use hadoop , apache pig. have .txt file data , script .pig file script :

student = load '/home/srv-hadoop/data.txt' using pigstorage(',')    (id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray);  student_order = order student firstname asc;  dump student_order; 

and .txt file :

001,rajiv,reddy,21,9848022337,hyderabad 002,siddarth,battacharya,22,9848022338,kolkata 003,rajesh,khanna,22,9848022339,delhi 004,preethi,agarwal,21,9848022330,pune 005,trupthi,mohanthy,23,9848022336,bhuwaneshwar 006,archana,mishra,23,9848022335,chennai 007,komal,nayak,24,9848022334,trivendram 008,bharathi,nambiayar,24,9848022333,chennai 

but, when execute : pig -x mapreduce data.pig

17/07/25 17:04:59 info pig.exectypeprovider: trying exectype : local 17/07/25 17:04:59 info pig.exectypeprovider: trying exectype : mapreduce 17/07/25 17:04:59 info pig.exectypeprovider: picked mapreduce exectype 2017-07-25 17:04:59,399 [main] info  org.apache.pig.main - apache pig version 0.17.0 (r1797386) compiled jun 02 2017, 15:41:58 2017-07-25 17:04:59,399 [main] info  org.apache.pig.main - logging error messages to: /home/srv-hadoop/pig_1500995099397.log 2017-07-25 17:04:59,749 [main] warn  org.apache.hadoop.util.nativecodeloader - unable load native-hadoop library platform... using builtin-java classes applicable 2017-07-25 17:04:59,930 [main] info  org.apache.pig.impl.util.utils - default bootup file /home/srv-hadoop/.pigbootup not found 2017-07-25 17:05:00,062 [main] info  org.apache.hadoop.conf.configuration.deprecation - mapred.job.tracker deprecated. instead, use mapreduce.jobtracker.address 2017-07-25 17:05:00,066 [main] info  org.apache.pig.backend.hadoop.executionengine.hexecutionengine - connecting hadoop file system at: hdfs://localhost:54310 2017-07-25 17:05:00,470 [main] info  org.apache.pig.backend.hadoop.executionengine.hexecutionengine - connecting map-reduce job tracker at: localhost:54311 2017-07-25 17:05:00,489 [main] info  org.apache.pig.pigserver - pig script id session: pig-data.pig-2bb2e75c-41a7-42bf-926f-05354b881211 2017-07-25 17:05:00,489 [main] warn  org.apache.pig.pigserver - ats disabled since yarn.timeline-service.enabled set false 2017-07-25 17:05:01,230 [main] info  org.apache.pig.tools.pigstats.scriptstate - pig features used in script: order_by 2017-07-25 17:05:01,279 [main] info  org.apache.pig.data.schematuplebackend - key [pig.schematuple] not set... not generate code. 2017-07-25 17:05:01,308 [main] info  org.apache.pig.newplan.logical.optimizer.logicalplanoptimizer - {rules_enabled=[addforeach, columnmapkeyprune, constantcalculator, groupbyconstparallelsetter, limitoptimizer, loadtypecastinserter, mergefilter, mergeforeach, nestedlimitoptimizer, partitionfilteroptimizer, predicatepushdownoptimizer, pushdownforeachflatten, pushupfilter, splitfilter, streamtypecastinserter]} 2017-07-25 17:05:01,362 [main] info  org.apache.pig.impl.util.spillablememorymanager - selected heap (ps old gen) of size 699400192 monitor. collectionusagethreshold = 489580128, usagethreshold = 489580128 2017-07-25 17:05:01,411 [main] info  org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mrcompiler - file concatenation threshold: 100 optimistic? false 2017-07-25 17:05:01,452 [main] info  org.apache.pig.backend.hadoop.executionengine.mapreducelayer.secondarykeyoptimizermr - using secondary key optimization mapreduce node scope-23 2017-07-25 17:05:01,462 [main] info  org.apache.pig.backend.hadoop.executionengine.mapreducelayer.multiqueryoptimizer - mr plan size before optimization: 3 2017-07-25 17:05:01,462 [main] info  org.apache.pig.backend.hadoop.executionengine.mapreducelayer.multiqueryoptimizer - mr plan size after optimization: 3 2017-07-25 17:05:01,515 [main] info  org.apache.hadoop.conf.configuration.deprecation - session.id deprecated. instead, use dfs.metrics.session-id 2017-07-25 17:05:01,516 [main] info  org.apache.hadoop.metrics.jvm.jvmmetrics - initializing jvm metrics processname=jobtracker, sessionid= 2017-07-25 17:05:01,548 [main] info  org.apache.pig.tools.pigstats.mapreduce.mrscriptstate - pig script settings added job 2017-07-25 17:05:01,552 [main] info  org.apache.hadoop.conf.configuration.deprecation - mapred.job.reduce.markreset.buffer.percent deprecated. instead, use mapreduce.reduce.markreset.buffer.percent 2017-07-25 17:05:01,552 [main] info  org.apache.pig.backend.hadoop.executionengine.mapreducelayer.jobcontrolcompiler - mapred.job.reduce.markreset.buffer.percent not set, set default 0.3 2017-07-25 17:05:01,555 [main] info  org.apache.hadoop.conf.configuration.deprecation - mapred.output.compress deprecated. instead, use mapreduce.output.fileoutputformat.compress 2017-07-25 17:05:01,558 [main] info  org.apache.pig.backend.hadoop.executionengine.mapreducelayer.jobcontrolcompiler - job cannot converted run in-process 2017-07-25 17:05:01,570 [main] info  org.apache.hadoop.conf.configuration.deprecation - mapred.submit.replication deprecated. instead, use mapreduce.client.submit.file.replication 2017-07-25 17:05:01,891 [main] info  org.apache.pig.backend.hadoop.executionengine.mapreducelayer.jobcontrolcompiler - added jar file:/home/srv-hadoop/pig/pig-0.17.0-core-h2.jar distributedcache through /tmp/temp1676993497/tmp-1698368733/pig-0.17.0-core-h2.jar 2017-07-25 17:05:01,932 [main] info  org.apache.pig.backend.hadoop.executionengine.mapreducelayer.jobcontrolcompiler - added jar file:/home/srv-hadoop/pig/lib/automaton-1.11-8.jar distributedcache through /tmp/temp1676993497/tmp885160047/automaton-1.11-8.jar 2017-07-25 17:05:01,975 [main] info  org.apache.pig.backend.hadoop.executionengine.mapreducelayer.jobcontrolcompiler - added jar file:/home/srv-hadoop/pig/lib/antlr-runtime-3.4.jar distributedcache through /tmp/temp1676993497/tmp-1346471388/antlr-runtime-3.4.jar 2017-07-25 17:05:02,012 [main] info  org.apache.pig.backend.hadoop.executionengine.mapreducelayer.jobcontrolcompiler - added jar file:/home/srv-hadoop/pig/lib/joda-time-2.9.3.jar distributedcache through /tmp/temp1676993497/tmp32088650/joda-time-2.9.3.jar 2017-07-25 17:05:02,023 [main] info  org.apache.pig.backend.hadoop.executionengine.mapreducelayer.jobcontrolcompiler - setting single store job 2017-07-25 17:05:02,031 [main] info  org.apache.pig.data.schematuplefrontend - key [pig.schematuple] false, not generate code. 2017-07-25 17:05:02,031 [main] info  org.apache.pig.data.schematuplefrontend - starting process move generated code distributed cacche 2017-07-25 17:05:02,031 [main] info  org.apache.pig.data.schematuplefrontend - setting key [pig.schematuple.classes] classes deserialize [] 2017-07-25 17:05:02,093 [main] info  org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher - 1 map-reduce job(s) waiting submission. 2017-07-25 17:05:02,095 [main] info  org.apache.hadoop.conf.configuration.deprecation - mapred.job.tracker.http.address deprecated. instead, use mapreduce.jobtracker.http.address 2017-07-25 17:05:02,095 [main] info  org.apache.hadoop.conf.configuration.deprecation - mapred.job.tracker deprecated. instead, use mapreduce.jobtracker.address 2017-07-25 17:05:02,104 [jobcontrol] info  org.apache.hadoop.metrics.jvm.jvmmetrics - cannot initialize jvm metrics processname=jobtracker, sessionid= - initialized 2017-07-25 17:05:02,113 [jobcontrol] info  org.apache.hadoop.conf.configuration.deprecation - mapred.task.id deprecated. instead, use mapreduce.task.attempt.id 2017-07-25 17:05:02,178 [jobcontrol] warn  org.apache.hadoop.mapreduce.jobresourceuploader - no job jar file set.  user classes may not found. see job or job#setjar(string). 2017-07-25 17:05:02,207 [jobcontrol] info  org.apache.pig.builtin.pigstorage - using pigtextinputformat 2017-07-25 17:05:02,213 [jobcontrol] info  org.apache.hadoop.mapreduce.jobsubmitter - cleaning staging area file:/home/srv-hadoop/hadoop-2.6.2/tmp/mapred/staging/srv-hadoop1897657638/.staging/job_local1897657638_0001 2017-07-25 17:05:02,214 [jobcontrol] info  org.apache.hadoop.mapreduce.lib.jobcontrol.controlledjob - piglatin:data.pig got error while submitting  org.apache.pig.backend.executionengine.execexception: error 2118: input path not exist: hdfs://localhost:54310/home/srv-hadoop/data.txt     @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.piginputformat.getsplits(piginputformat.java:294)     @ org.apache.hadoop.mapreduce.jobsubmitter.writenewsplits(jobsubmitter.java:302)     @ org.apache.hadoop.mapreduce.jobsubmitter.writesplits(jobsubmitter.java:319)     @ org.apache.hadoop.mapreduce.jobsubmitter.submitjobinternal(jobsubmitter.java:197)     @ org.apache.hadoop.mapreduce.job$10.run(job.java:1297)     @ org.apache.hadoop.mapreduce.job$10.run(job.java:1294)     @ java.security.accesscontroller.doprivileged(native method)     @ javax.security.auth.subject.doas(subject.java:422)     @ org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation.java:1656)     @ org.apache.hadoop.mapreduce.job.submit(job.java:1294)     @ org.apache.hadoop.mapreduce.lib.jobcontrol.controlledjob.submit(controlledjob.java:335)     @ sun.reflect.nativemethodaccessorimpl.invoke0(native method)     @ sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl.java:62)     @ sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl.java:43)     @ java.lang.reflect.method.invoke(method.java:498)     @ org.apache.pig.backend.hadoop.pigjobcontrol.submit(pigjobcontrol.java:128)     @ org.apache.pig.backend.hadoop.pigjobcontrol.run(pigjobcontrol.java:205)     @ java.lang.thread.run(thread.java:748)     @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher$1.run(mapreducelauncher.java:301) caused by: org.apache.hadoop.mapreduce.lib.input.invalidinputexception: input path not exist: hdfs://localhost:54310/home/srv-hadoop/data.txt     @ org.apache.hadoop.mapreduce.lib.input.fileinputformat.singlethreadedliststatus(fileinputformat.java:321)     @ org.apache.hadoop.mapreduce.lib.input.fileinputformat.liststatus(fileinputformat.java:264)     @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.pigtextinputformat.liststatus(pigtextinputformat.java:36)     @ org.apache.hadoop.mapreduce.lib.input.fileinputformat.getsplits(fileinputformat.java:385)     @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.piginputformat.getsplits(piginputformat.java:280)     ... 18 more 2017-07-25 17:05:02,597 [main] info  org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher - hadoopjobid: job_local1897657638_0001 2017-07-25 17:05:02,597 [main] info  org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher - processing aliases student 2017-07-25 17:05:02,597 [main] info  org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher - detailed locations: m: student[1,10],student[-1,-1] c:  r:  2017-07-25 17:05:02,600 [main] info  org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher - 0% complete 2017-07-25 17:05:07,608 [main] warn  org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher - ooops! job has failed! specify -stop_on_failure if want pig stop on failure. 2017-07-25 17:05:07,608 [main] info  org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher - job job_local1897657638_0001 has failed! stop running dependent jobs 2017-07-25 17:05:07,609 [main] info  org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher - 100% complete 2017-07-25 17:05:07,619 [main] info  org.apache.hadoop.metrics.jvm.jvmmetrics - cannot initialize jvm metrics processname=jobtracker, sessionid= - initialized 2017-07-25 17:05:07,620 [main] error org.apache.pig.tools.pigstats.pigstats - error 0: java.lang.illegalstateexception: job in state define instead of running 2017-07-25 17:05:07,620 [main] error org.apache.pig.tools.pigstats.mapreduce.mrpigstatsutil - 1 map reduce job(s) failed! 2017-07-25 17:05:07,622 [main] info  org.apache.pig.tools.pigstats.mapreduce.simplepigstats - script statistics:   hadoopversion   pigversion  userid  startedat   finishedat  features 2.6.2   0.17.0  srv-hadoop  2017-07-25 17:05:01 2017-07-25 17:05:07 order_by  failed!  failed jobs: jobid   alias   feature message outputs job_local1897657638_0001    student map_only    message: org.apache.pig.backend.executionengine.execexception: error 2118: input path not exist: hdfs://localhost:54310/home/srv-hadoop/data.txt     @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.piginputformat.getsplits(piginputformat.java:294)     @ org.apache.hadoop.mapreduce.jobsubmitter.writenewsplits(jobsubmitter.java:302)     @ org.apache.hadoop.mapreduce.jobsubmitter.writesplits(jobsubmitter.java:319)     @ org.apache.hadoop.mapreduce.jobsubmitter.submitjobinternal(jobsubmitter.java:197)     @ org.apache.hadoop.mapreduce.job$10.run(job.java:1297)     @ org.apache.hadoop.mapreduce.job$10.run(job.java:1294)     @ java.security.accesscontroller.doprivileged(native method)     @ javax.security.auth.subject.doas(subject.java:422)     @ org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation.java:1656)     @ org.apache.hadoop.mapreduce.job.submit(job.java:1294)     @ org.apache.hadoop.mapreduce.lib.jobcontrol.controlledjob.submit(controlledjob.java:335)     @ sun.reflect.nativemethodaccessorimpl.invoke0(native method)     @ sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl.java:62)     @ sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl.java:43)     @ java.lang.reflect.method.invoke(method.java:498)     @ org.apache.pig.backend.hadoop.pigjobcontrol.submit(pigjobcontrol.java:128)     @ org.apache.pig.backend.hadoop.pigjobcontrol.run(pigjobcontrol.java:205)     @ java.lang.thread.run(thread.java:748)     @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher$1.run(mapreducelauncher.java:301) caused by: org.apache.hadoop.mapreduce.lib.input.invalidinputexception: input path not exist: hdfs://localhost:54310/home/srv-hadoop/data.txt     @ org.apache.hadoop.mapreduce.lib.input.fileinputformat.singlethreadedliststatus(fileinputformat.java:321)     @ org.apache.hadoop.mapreduce.lib.input.fileinputformat.liststatus(fileinputformat.java:264)     @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.pigtextinputformat.liststatus(pigtextinputformat.java:36)     @ org.apache.hadoop.mapreduce.lib.input.fileinputformat.getsplits(fileinputformat.java:385)     @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.piginputformat.getsplits(piginputformat.java:280)     ... 18 more   input(s): failed read data "/home/srv-hadoop/data.txt"  output(s):  counters: total records written : 0 total bytes written : 0 spillable memory manager spill count : 0 total bags proactively spilled: 0 total records proactively spilled: 0  job dag: job_local1897657638_0001    ->  null, null    ->  null, null   2017-07-25 17:05:07,622 [main] info  org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher - failed! 2017-07-25 17:05:07,624 [main] error org.apache.pig.tools.grunt.grunt - error 1066: unable open iterator alias student_order details @ logfile: /home/srv-hadoop/pig_1500995099397.log 2017-07-25 17:05:07,648 [main] info  org.apache.pig.main - pig script completed in 8 seconds , 442 milliseconds (8442 ms) 

i :

input(s): failed read data "/home/srv-hadoop/data.txt"  output(s):  counters: total records written : 0 total bytes written : 0 spillable memory manager spill count : 0 total bags proactively spilled: 0 total records proactively spilled: 0  job dag: job_local1897657638_0001    ->  null, null    ->  null, null 

bit, if execute : pig -x local data.pig --> works fine

i missed ?

hey seems 'data.txt' on local file system. when run 'pig -x mapreduce' expect input in hdfs.

since '/home/srv-hadoop/data.txt' file on local file system 'pig -x local ' working.

make directory on hadoop filesystem:

  1. hadoop fs -mkdir -p /home/srv-hadoop/

copy data.txt file local hadoop

  1. hadoop fs -put /home/srv-hadoop/data.txt /home/srv-hadoop/

now run pig in mapreduce mode. work fine


Comments

Popular posts from this blog

networking - Vagrant-provisioned VirtualBox VM is not reachable from Ubuntu host -

c# - ASP.NET Core - There is already an object named 'AspNetRoles' in the database -

ruby on rails - ArgumentError: Missing host to link to! Please provide the :host parameter, set default_url_options[:host], or set :only_path to true -