DWBI-TECH BLOGS (Pradeep Kannadiga): November 2018

Thursday, 15 November 2018

The options available are

1) Use ORC file format which provides better performance than Text.

2) Use Tez execution engine

3) Use Cost based optimization

4) Use Vectorization execution

5) Right sensible queries that avoids joins

1) First create a bucket on Amazon S3 and create public and private keys from IAM in AWS

2) Proper permission should be provided so that users with the public and private keys can access the bucket

3) Use some S3 client tool to test that the files are accessible.

4) Create the dataflow on Nifi using ListS3 , FetchS3Object and PutS3 object as shown in the diagrams below.

5) Setting of ListS3 is listed below. S3-kannadiga is bucket name in US-East region. The access key and secret key is entered in this processor.

6) Setting of Fetch S3 is given below. $(s3.bucket) is the setting to read from List S3 processor.

7) The setting of PutFile is given below.