hadoop - Pig script doesn't work with MapReduce -
i'm trying use hadoop , apache pig. have .txt file data , script .pig file script :
student = load '/home/srv-hadoop/data.txt' using pigstorage(',') (id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray); student_order = order student firstname asc; dump student_order;
and .txt file :
001,rajiv,reddy,21,9848022337,hyderabad 002,siddarth,battacharya,22,9848022338,kolkata 003,rajesh,khanna,22,9848022339,delhi 004,preethi,agarwal,21,9848022330,pune 005,trupthi,mohanthy,23,9848022336,bhuwaneshwar 006,archana,mishra,23,9848022335,chennai 007,komal,nayak,24,9848022334,trivendram 008,bharathi,nambiayar,24,9848022333,chennai
but, when execute : pig -x mapreduce data.pig
17/07/25 17:04:59 info pig.exectypeprovider: trying exectype : local 17/07/25 17:04:59 info pig.exectypeprovider: trying exectype : mapreduce 17/07/25 17:04:59 info pig.exectypeprovider: picked mapreduce exectype 2017-07-25 17:04:59,399 [main] info org.apache.pig.main - apache pig version 0.17.0 (r1797386) compiled jun 02 2017, 15:41:58 2017-07-25 17:04:59,399 [main] info org.apache.pig.main - logging error messages to: /home/srv-hadoop/pig_1500995099397.log 2017-07-25 17:04:59,749 [main] warn org.apache.hadoop.util.nativecodeloader - unable load native-hadoop library platform... using builtin-java classes applicable 2017-07-25 17:04:59,930 [main] info org.apache.pig.impl.util.utils - default bootup file /home/srv-hadoop/.pigbootup not found 2017-07-25 17:05:00,062 [main] info org.apache.hadoop.conf.configuration.deprecation - mapred.job.tracker deprecated. instead, use mapreduce.jobtracker.address 2017-07-25 17:05:00,066 [main] info org.apache.pig.backend.hadoop.executionengine.hexecutionengine - connecting hadoop file system at: hdfs://localhost:54310 2017-07-25 17:05:00,470 [main] info org.apache.pig.backend.hadoop.executionengine.hexecutionengine - connecting map-reduce job tracker at: localhost:54311 2017-07-25 17:05:00,489 [main] info org.apache.pig.pigserver - pig script id session: pig-data.pig-2bb2e75c-41a7-42bf-926f-05354b881211 2017-07-25 17:05:00,489 [main] warn org.apache.pig.pigserver - ats disabled since yarn.timeline-service.enabled set false 2017-07-25 17:05:01,230 [main] info org.apache.pig.tools.pigstats.scriptstate - pig features used in script: order_by 2017-07-25 17:05:01,279 [main] info org.apache.pig.data.schematuplebackend - key [pig.schematuple] not set... not generate code. 2017-07-25 17:05:01,308 [main] info org.apache.pig.newplan.logical.optimizer.logicalplanoptimizer - {rules_enabled=[addforeach, columnmapkeyprune, constantcalculator, groupbyconstparallelsetter, limitoptimizer, loadtypecastinserter, mergefilter, mergeforeach, nestedlimitoptimizer, partitionfilteroptimizer, predicatepushdownoptimizer, pushdownforeachflatten, pushupfilter, splitfilter, streamtypecastinserter]} 2017-07-25 17:05:01,362 [main] info org.apache.pig.impl.util.spillablememorymanager - selected heap (ps old gen) of size 699400192 monitor. collectionusagethreshold = 489580128, usagethreshold = 489580128 2017-07-25 17:05:01,411 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mrcompiler - file concatenation threshold: 100 optimistic? false 2017-07-25 17:05:01,452 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.secondarykeyoptimizermr - using secondary key optimization mapreduce node scope-23 2017-07-25 17:05:01,462 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.multiqueryoptimizer - mr plan size before optimization: 3 2017-07-25 17:05:01,462 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.multiqueryoptimizer - mr plan size after optimization: 3 2017-07-25 17:05:01,515 [main] info org.apache.hadoop.conf.configuration.deprecation - session.id deprecated. instead, use dfs.metrics.session-id 2017-07-25 17:05:01,516 [main] info org.apache.hadoop.metrics.jvm.jvmmetrics - initializing jvm metrics processname=jobtracker, sessionid= 2017-07-25 17:05:01,548 [main] info org.apache.pig.tools.pigstats.mapreduce.mrscriptstate - pig script settings added job 2017-07-25 17:05:01,552 [main] info org.apache.hadoop.conf.configuration.deprecation - mapred.job.reduce.markreset.buffer.percent deprecated. instead, use mapreduce.reduce.markreset.buffer.percent 2017-07-25 17:05:01,552 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.jobcontrolcompiler - mapred.job.reduce.markreset.buffer.percent not set, set default 0.3 2017-07-25 17:05:01,555 [main] info org.apache.hadoop.conf.configuration.deprecation - mapred.output.compress deprecated. instead, use mapreduce.output.fileoutputformat.compress 2017-07-25 17:05:01,558 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.jobcontrolcompiler - job cannot converted run in-process 2017-07-25 17:05:01,570 [main] info org.apache.hadoop.conf.configuration.deprecation - mapred.submit.replication deprecated. instead, use mapreduce.client.submit.file.replication 2017-07-25 17:05:01,891 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.jobcontrolcompiler - added jar file:/home/srv-hadoop/pig/pig-0.17.0-core-h2.jar distributedcache through /tmp/temp1676993497/tmp-1698368733/pig-0.17.0-core-h2.jar 2017-07-25 17:05:01,932 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.jobcontrolcompiler - added jar file:/home/srv-hadoop/pig/lib/automaton-1.11-8.jar distributedcache through /tmp/temp1676993497/tmp885160047/automaton-1.11-8.jar 2017-07-25 17:05:01,975 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.jobcontrolcompiler - added jar file:/home/srv-hadoop/pig/lib/antlr-runtime-3.4.jar distributedcache through /tmp/temp1676993497/tmp-1346471388/antlr-runtime-3.4.jar 2017-07-25 17:05:02,012 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.jobcontrolcompiler - added jar file:/home/srv-hadoop/pig/lib/joda-time-2.9.3.jar distributedcache through /tmp/temp1676993497/tmp32088650/joda-time-2.9.3.jar 2017-07-25 17:05:02,023 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.jobcontrolcompiler - setting single store job 2017-07-25 17:05:02,031 [main] info org.apache.pig.data.schematuplefrontend - key [pig.schematuple] false, not generate code. 2017-07-25 17:05:02,031 [main] info org.apache.pig.data.schematuplefrontend - starting process move generated code distributed cacche 2017-07-25 17:05:02,031 [main] info org.apache.pig.data.schematuplefrontend - setting key [pig.schematuple.classes] classes deserialize [] 2017-07-25 17:05:02,093 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher - 1 map-reduce job(s) waiting submission. 2017-07-25 17:05:02,095 [main] info org.apache.hadoop.conf.configuration.deprecation - mapred.job.tracker.http.address deprecated. instead, use mapreduce.jobtracker.http.address 2017-07-25 17:05:02,095 [main] info org.apache.hadoop.conf.configuration.deprecation - mapred.job.tracker deprecated. instead, use mapreduce.jobtracker.address 2017-07-25 17:05:02,104 [jobcontrol] info org.apache.hadoop.metrics.jvm.jvmmetrics - cannot initialize jvm metrics processname=jobtracker, sessionid= - initialized 2017-07-25 17:05:02,113 [jobcontrol] info org.apache.hadoop.conf.configuration.deprecation - mapred.task.id deprecated. instead, use mapreduce.task.attempt.id 2017-07-25 17:05:02,178 [jobcontrol] warn org.apache.hadoop.mapreduce.jobresourceuploader - no job jar file set. user classes may not found. see job or job#setjar(string). 2017-07-25 17:05:02,207 [jobcontrol] info org.apache.pig.builtin.pigstorage - using pigtextinputformat 2017-07-25 17:05:02,213 [jobcontrol] info org.apache.hadoop.mapreduce.jobsubmitter - cleaning staging area file:/home/srv-hadoop/hadoop-2.6.2/tmp/mapred/staging/srv-hadoop1897657638/.staging/job_local1897657638_0001 2017-07-25 17:05:02,214 [jobcontrol] info org.apache.hadoop.mapreduce.lib.jobcontrol.controlledjob - piglatin:data.pig got error while submitting org.apache.pig.backend.executionengine.execexception: error 2118: input path not exist: hdfs://localhost:54310/home/srv-hadoop/data.txt @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.piginputformat.getsplits(piginputformat.java:294) @ org.apache.hadoop.mapreduce.jobsubmitter.writenewsplits(jobsubmitter.java:302) @ org.apache.hadoop.mapreduce.jobsubmitter.writesplits(jobsubmitter.java:319) @ org.apache.hadoop.mapreduce.jobsubmitter.submitjobinternal(jobsubmitter.java:197) @ org.apache.hadoop.mapreduce.job$10.run(job.java:1297) @ org.apache.hadoop.mapreduce.job$10.run(job.java:1294) @ java.security.accesscontroller.doprivileged(native method) @ javax.security.auth.subject.doas(subject.java:422) @ org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation.java:1656) @ org.apache.hadoop.mapreduce.job.submit(job.java:1294) @ org.apache.hadoop.mapreduce.lib.jobcontrol.controlledjob.submit(controlledjob.java:335) @ sun.reflect.nativemethodaccessorimpl.invoke0(native method) @ sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl.java:62) @ sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl.java:43) @ java.lang.reflect.method.invoke(method.java:498) @ org.apache.pig.backend.hadoop.pigjobcontrol.submit(pigjobcontrol.java:128) @ org.apache.pig.backend.hadoop.pigjobcontrol.run(pigjobcontrol.java:205) @ java.lang.thread.run(thread.java:748) @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher$1.run(mapreducelauncher.java:301) caused by: org.apache.hadoop.mapreduce.lib.input.invalidinputexception: input path not exist: hdfs://localhost:54310/home/srv-hadoop/data.txt @ org.apache.hadoop.mapreduce.lib.input.fileinputformat.singlethreadedliststatus(fileinputformat.java:321) @ org.apache.hadoop.mapreduce.lib.input.fileinputformat.liststatus(fileinputformat.java:264) @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.pigtextinputformat.liststatus(pigtextinputformat.java:36) @ org.apache.hadoop.mapreduce.lib.input.fileinputformat.getsplits(fileinputformat.java:385) @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.piginputformat.getsplits(piginputformat.java:280) ... 18 more 2017-07-25 17:05:02,597 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher - hadoopjobid: job_local1897657638_0001 2017-07-25 17:05:02,597 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher - processing aliases student 2017-07-25 17:05:02,597 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher - detailed locations: m: student[1,10],student[-1,-1] c: r: 2017-07-25 17:05:02,600 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher - 0% complete 2017-07-25 17:05:07,608 [main] warn org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher - ooops! job has failed! specify -stop_on_failure if want pig stop on failure. 2017-07-25 17:05:07,608 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher - job job_local1897657638_0001 has failed! stop running dependent jobs 2017-07-25 17:05:07,609 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher - 100% complete 2017-07-25 17:05:07,619 [main] info org.apache.hadoop.metrics.jvm.jvmmetrics - cannot initialize jvm metrics processname=jobtracker, sessionid= - initialized 2017-07-25 17:05:07,620 [main] error org.apache.pig.tools.pigstats.pigstats - error 0: java.lang.illegalstateexception: job in state define instead of running 2017-07-25 17:05:07,620 [main] error org.apache.pig.tools.pigstats.mapreduce.mrpigstatsutil - 1 map reduce job(s) failed! 2017-07-25 17:05:07,622 [main] info org.apache.pig.tools.pigstats.mapreduce.simplepigstats - script statistics: hadoopversion pigversion userid startedat finishedat features 2.6.2 0.17.0 srv-hadoop 2017-07-25 17:05:01 2017-07-25 17:05:07 order_by failed! failed jobs: jobid alias feature message outputs job_local1897657638_0001 student map_only message: org.apache.pig.backend.executionengine.execexception: error 2118: input path not exist: hdfs://localhost:54310/home/srv-hadoop/data.txt @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.piginputformat.getsplits(piginputformat.java:294) @ org.apache.hadoop.mapreduce.jobsubmitter.writenewsplits(jobsubmitter.java:302) @ org.apache.hadoop.mapreduce.jobsubmitter.writesplits(jobsubmitter.java:319) @ org.apache.hadoop.mapreduce.jobsubmitter.submitjobinternal(jobsubmitter.java:197) @ org.apache.hadoop.mapreduce.job$10.run(job.java:1297) @ org.apache.hadoop.mapreduce.job$10.run(job.java:1294) @ java.security.accesscontroller.doprivileged(native method) @ javax.security.auth.subject.doas(subject.java:422) @ org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation.java:1656) @ org.apache.hadoop.mapreduce.job.submit(job.java:1294) @ org.apache.hadoop.mapreduce.lib.jobcontrol.controlledjob.submit(controlledjob.java:335) @ sun.reflect.nativemethodaccessorimpl.invoke0(native method) @ sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl.java:62) @ sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl.java:43) @ java.lang.reflect.method.invoke(method.java:498) @ org.apache.pig.backend.hadoop.pigjobcontrol.submit(pigjobcontrol.java:128) @ org.apache.pig.backend.hadoop.pigjobcontrol.run(pigjobcontrol.java:205) @ java.lang.thread.run(thread.java:748) @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher$1.run(mapreducelauncher.java:301) caused by: org.apache.hadoop.mapreduce.lib.input.invalidinputexception: input path not exist: hdfs://localhost:54310/home/srv-hadoop/data.txt @ org.apache.hadoop.mapreduce.lib.input.fileinputformat.singlethreadedliststatus(fileinputformat.java:321) @ org.apache.hadoop.mapreduce.lib.input.fileinputformat.liststatus(fileinputformat.java:264) @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.pigtextinputformat.liststatus(pigtextinputformat.java:36) @ org.apache.hadoop.mapreduce.lib.input.fileinputformat.getsplits(fileinputformat.java:385) @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.piginputformat.getsplits(piginputformat.java:280) ... 18 more input(s): failed read data "/home/srv-hadoop/data.txt" output(s): counters: total records written : 0 total bytes written : 0 spillable memory manager spill count : 0 total bags proactively spilled: 0 total records proactively spilled: 0 job dag: job_local1897657638_0001 -> null, null -> null, null 2017-07-25 17:05:07,622 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher - failed! 2017-07-25 17:05:07,624 [main] error org.apache.pig.tools.grunt.grunt - error 1066: unable open iterator alias student_order details @ logfile: /home/srv-hadoop/pig_1500995099397.log 2017-07-25 17:05:07,648 [main] info org.apache.pig.main - pig script completed in 8 seconds , 442 milliseconds (8442 ms)
i :
input(s): failed read data "/home/srv-hadoop/data.txt" output(s): counters: total records written : 0 total bytes written : 0 spillable memory manager spill count : 0 total bags proactively spilled: 0 total records proactively spilled: 0 job dag: job_local1897657638_0001 -> null, null -> null, null
bit, if execute : pig -x local data.pig --> works fine
i missed ?
hey seems 'data.txt' on local file system. when run 'pig -x mapreduce' expect input in hdfs.
since '/home/srv-hadoop/data.txt' file on local file system 'pig -x local ' working.
make directory on hadoop filesystem:
- hadoop fs -mkdir -p /home/srv-hadoop/
copy data.txt file local hadoop
- hadoop fs -put /home/srv-hadoop/data.txt /home/srv-hadoop/
now run pig in mapreduce mode. work fine
Comments
Post a Comment