Sunday, 9 December 2018

Submitting pyspark jobs on Yarn and accessing hive tables from spark

Submitting pyspark jobs on Yarn and accessing hive tables from spark 

I was getting the below error while trying to access hive table through pyspark while submitting the job on spark.

yspark.sql.utils.AnalysisException: u'Table not found: `prady_retail_db_orc`.`orders`;'

I had to then start accessing the table through the hive context to fix the issue. 

The Command to submit the job is:

spark-submit --master yarn --conf "spark.ui.port=10111"

The above command submit the program on yarn on port 10111.

The contents on is below. It is accessing a hive table called orders and writing the contents of the table in parquert format to hdfs location. contents----------------

from pyspark import SparkContext, SQLContext

from pyspark.sql import HiveContext


sqlContext = HiveContext(sc)

dailyrevDF=sqlContext.sql("select * from prady_retail_db_orc.orders")

dailyrevDF.write.mode("overwrite").format("parquet").option("compression", "none").mod
