hadoop - Oozie pyspark job -


i have simple workflow.

<workflow-app name="testsparkjob" xmlns="uri:oozie:workflow:0.5"> <start to="testjob"/>    <action name="testjob">     <spark xmlns="uri:oozie:spark-action:0.1">         <job-tracker>${jobtracker}</job-tracker>         <name-node>${namenode}</name-node>         <configuration>             <property>                 <name>mapred.compress.map.output</name>                 <value>true</value>             </property>         </configuration>         <master>local[*]</master>         <name>spark example</name>         <jar>mapping.py</jar>         <spark-opts>--executor-memory 1g --num-executors 3  --executor-cores     1 </spark-opts>         <arg>argument1</arg>         <arg>argument2</arg>     </spark>     <ok to="end"/>     <error to="killaction"/> </action>  <kill name="killaction">     <message>"killed job due error"</message> </kill> <end name="end"/> </workflow-app> 

spark script pretty nothing:

if len(sys.argv) < 2:   print('you must pass 2 parameters ')   #just testing, later discarded, sys.exit(1) used.")   ext = 'testarga'   int = 'testargb'   #sys.exit(1) else:   print('arguments accepted')   ext = sys.argv[1]   int = sys.argv[2] 

the script located on hdfs in same folder workflow.xml.

when runt workflow got following error

launcher error, reason: main class  [org.apache.oozie.action.hadoop.sparkmain], exit code [2] 

i tought permission issue, set hdfs folder -chmod 777 , local folder chmod 777 using spark 1.6. when run script through spark-submit, fine (even more comlicated scripts read/write hdfs or hive).

edit: tried this

<action name="forceloadfromlocal2hdfs"> <shell xmlns="uri:oozie:shell-action:0.3">   <job-tracker>${jobtracker}</job-tracker>   <name-node>${namenode}</name-node>   <configuration>     <property>       <name>mapred.job.queue.name</name>       <value>${queuename}</value>     </property>   </configuration>   <exec>driver-script.sh</exec> <!-- single -->   <argument>s</argument> <!-- py script -->   <argument>load_local_2_hdfs.py</argument> <!-- local file moved-->   <argument>localfilepath</argument> <!-- hdfs destination folder, aware of, script deleting existing folder! -->   <argument>hdfspath</argument>   <file>${workflowroot}driver-script.sh</file>   <file>${workflowroot}load_local_2_hdfs.py</file> </shell> <ok to="end"/> <error to="killaction"/> 

the workkflow succeeded, file not copied hdfs. no errors. script work tho. more here.

unfortunately oozie spark action supports java artifacts, have specify main class (that error message hardly trying explain). have 2 options:

  1. rewrite code java/scala
  2. use custom action or script this (i did not test it)

Comments

Popular posts from this blog

networking - Vagrant-provisioned VirtualBox VM is not reachable from Ubuntu host -

c# - ASP.NET Core - There is already an object named 'AspNetRoles' in the database -

ruby on rails - ArgumentError: Missing host to link to! Please provide the :host parameter, set default_url_options[:host], or set :only_path to true -