Easy steps to set encodings for table compression in Amazon redshift.
If you are loading data from s3 to redshift then you can use the copy command with COMPUPDATE ON option to automatically set the compression on the target based on the data that is getting loaded. Redshift will analyze some 200K and set the compression/encoding that is best for the data that is getting loaded.
If you are loading the data from other tables in redshift and need to explicity set the encodings then following the steps outlined below might help:
a) Load some sample set of records to the target table on redshift that you want to set the encoding/compressions on. Use Analyze compression <tablename> command to check the encodings that is suitable for this target table.
For example: We are loading data to the table employee below.
redshift> Analyze compression employee
table_name column Encoding
-----------------------------------
'Employee','employee_key','raw'
'Employee','employee_number','lzo'
'Employee','employee_type','runlength'
'Employee','employee_status','lzo'
'Employee','employee_source_code','lzo'
'Employee','employee_tier','lzo'
'Employee','enterprise_tier','lzo'
'Employee','serving_flag','runlength'
'Employee','employee_flag','lzo'
'Employee','account_update_date','delta32k'
'Employee','tenure_months','bytedict'
'Employee','creation_date','delta32k'
'Employee','category','bytedict'
'Employee','extended_flag','runlength'
'Employee','business_date','runlength'
b) New recreate the table employee on redshift with those encodings:
CREATE TABLE EMPLOYEE
(
employee_key INTEGER NOT NULL encode raw,
employee_num CHAR(11) NOT NULL encode lzo,
employee_type CHAR(1) NOT NULL encode runlength,
employee_status CHAR(1) NOT NULL encode lzo,
employee_tier CHAR(1) NOT NULL encode lzo,
serving_flag char(1) NOT NULL encode runlength,
employee_flag char(1) NOT NULL encode lzo,
account_update_date DATE NOT NULL encode delta32k,
tenure_months INEGER NOT NULL encode bytedict,
creation_date DATE NOT NULL encode delta32k,
category char(5) NOT NULL encode bytedict,
extended_flag CHAR(1) NOT NULL encode runlength,
business_date DATE NOT NULL encode delta32k
)
distkey (employee_key);
c) Now reload the data into the table and you should see the encodings applied for this table.
redshift> Analyze compression employee
No comments:
Post a Comment