Friday, 16 January 2015

Easy steps to set encodings for table compression in Amazon redshift.

Easy steps to set encodings for table compression in Amazon redshift.



If you are loading data from s3 to redshift then you can use the copy command with COMPUPDATE ON option to automatically set the compression on the target based on the data that is getting loaded. Redshift will analyze some 200K and set the compression/encoding that is best for the data that is getting loaded.

If you are loading the data from other tables in redshift and need to explicity set the encodings then following the steps outlined below might help:

a) Load some sample set of records to the target table on redshift that you want to set the encoding/compressions on. Use Analyze compression <tablename> command to check the encodings that is suitable for this target table.

For example: We are loading data to the table employee below.

redshift> Analyze compression employee

table_name      column  Encoding
-----------------------------------
'Employee','employee_key','raw'
'Employee','employee_number','lzo'
'Employee','employee_type','runlength'
'Employee','employee_status','lzo'
'Employee','employee_source_code','lzo'
'Employee','employee_tier','lzo'
'Employee','enterprise_tier','lzo'
'Employee','serving_flag','runlength'
'Employee','employee_flag','lzo'
'Employee','account_update_date','delta32k'
'Employee','tenure_months','bytedict'
'Employee','creation_date','delta32k'
'Employee','category','bytedict'
'Employee','extended_flag','runlength'
'Employee','business_date','runlength'


b) New recreate the table employee on redshift with those encodings:



CREATE TABLE  EMPLOYEE
(
   employee_key                  INTEGER                       NOT NULL encode raw,
    employee_num         CHAR(11)                       NOT NULL encode lzo,
    employee_type   CHAR(1)                       NOT NULL encode runlength,
    employee_status    CHAR(1)                       NOT NULL encode lzo,
    employee_tier    CHAR(1)                       NOT NULL encode lzo,
     serving_flag           char(1)                          NOT NULL encode runlength,
    employee_flag            char(1)                         NOT NULL encode lzo,
    account_update_date           DATE                       NOT NULL  encode delta32k,
    tenure_months     INEGER                       NOT NULL encode bytedict,
    creation_date DATE NOT NULL  encode delta32k,
    category char(5) NOT NULL encode bytedict,
    extended_flag CHAR(1)                       NOT NULL encode runlength,
    business_date DATE                       NOT NULL  encode delta32k
  )
distkey (employee_key);


c) Now reload the data into the table and you should see the encodings applied for this table.


redshift> Analyze compression employee