Submitting pyspark jobs on Yarn and accessing hive tables from spark
I was getting the below error while trying to access hive table through pyspark while submitting the job on spark.
yspark.sql.utils.AnalysisException: u'Table not found: `prady_retail_db_orc`.`orders`;'
I had to then start accessing the table through the hive context to fix the issue.
The Command to submit the
job is:
spark-submit --master yarn --conf
"spark.ui.port=10111" test2.py
The above command submit the test2.py program on yarn on port 10111.
The contents on test2.py is below. It is accessing a hive table called orders and writing the contents of the table in parquert format to hdfs location.
----------------test2.py contents----------------
from pyspark import SparkContext, SQLContext
from pyspark.sql import HiveContext
sc=SparkContext()
sqlContext = HiveContext(sc)
dailyrevDF=sqlContext.sql("select * from
prady_retail_db_orc.orders")
dailyrevDF.write.mode("overwrite").format("parquet").option("compression",
"none").mod
e("overwrite").save("/user/prady/data/test7")
No comments:
Post a Comment