hadoop - Oozie pyspark job -

January 15, 2013

i have simple workflow.

<workflow-app name="testsparkjob" xmlns="uri:oozie:workflow:0.5"> <start to="testjob"/>    <action name="testjob">     <spark xmlns="uri:oozie:spark-action:0.1">         <job-tracker>${jobtracker}</job-tracker>         <name-node>${namenode}</name-node>         <configuration>             <property>                 <name>mapred.compress.map.output</name>                 <value>true</value>             </property>         </configuration>         <master>local[*]</master>         <name>spark example</name>         <jar>mapping.py</jar>         <spark-opts>--executor-memory 1g --num-executors 3  --executor-cores     1 </spark-opts>         <arg>argument1</arg>         <arg>argument2</arg>     </spark>     <ok to="end"/>     <error to="killaction"/> </action>  <kill name="killaction">     <message>"killed job due error"</message> </kill> <end name="end"/> </workflow-app>

spark script pretty nothing:

if len(sys.argv) < 2:   print('you must pass 2 parameters ')   #just testing, later discarded, sys.exit(1) used.")   ext = 'testarga'   int = 'testargb'   #sys.exit(1) else:   print('arguments accepted')   ext = sys.argv[1]   int = sys.argv[2]

the script located on hdfs in same folder workflow.xml.

when runt workflow got following error

launcher error, reason: main class  [org.apache.oozie.action.hadoop.sparkmain], exit code [2]

i tought permission issue, set hdfs folder -chmod 777 , local folder chmod 777 using spark 1.6. when run script through spark-submit, fine (even more comlicated scripts read/write hdfs or hive).

edit: tried this

<action name="forceloadfromlocal2hdfs"> <shell xmlns="uri:oozie:shell-action:0.3">   <job-tracker>${jobtracker}</job-tracker>   <name-node>${namenode}</name-node>   <configuration>     <property>       <name>mapred.job.queue.name</name>       <value>${queuename}</value>     </property>   </configuration>   <exec>driver-script.sh</exec> <!-- single -->   <argument>s</argument> <!-- py script -->   <argument>load_local_2_hdfs.py</argument> <!-- local file moved-->   <argument>localfilepath</argument> <!-- hdfs destination folder, aware of, script deleting existing folder! -->   <argument>hdfspath</argument>   <file>${workflowroot}driver-script.sh</file>   <file>${workflowroot}load_local_2_hdfs.py</file> </shell> <ok to="end"/> <error to="killaction"/>

the workkflow succeeded, file not copied hdfs. no errors. script work tho. more here.

unfortunately oozie spark action supports java artifacts, have specify main class (that error message hardly trying explain). have 2 options:

rewrite code java/scala
use custom action or script this (i did not test it)

Search This Blog

TY

hadoop - Oozie pyspark job -

Comments

Post a Comment

Popular posts from this blog

html - How to set bootstrap input responsive width? -

javascript - Highchart x and y axes data from json -

javascript - Get js console.log as python variable in QWebView pyqt -