Tuesday, 11 November 2014

Amazon Redshift - Top concerns and issues

 Amazon Redshift - Top concerns and issues


Amazon redshift cloud data warehouse is going to change the data warehousing landscape sooner or later. With more and more companies wanting to expand their data warehousing capabilities and wanting high powered devices that can churn huge loads, Amazon redshift is a very attractive proposition. It offers a peta byte scale data warehouse appliance which is very cheap and payable on demand on hourly basis with zero or no administration or maintenance required and over very high performance. All we have to do find a secure reliable way to upload data to the amazon redshift cloud which is not hard by the way, and then go crazy doing any kind of data processing. It is a great solution for sure, but before you jump into the bandwagon, a few concerns to be noted. The amazon redshift top concerns are listed below:

1) How much data do you have to upload and download and at what frequency ? - If you are uploading gigabytes of data to amazon s3 and then loading to amazon redshift, there is some amount of time required to upload the data and which also consumes huge bandwidth. Again not a problem if you are doing once a week or month. If doing daily or very constantly then yes this will add latency and network cost.  

2) Integration with ETL/BI tools: If you are reading/writing huge volume data  for ETL/BI processing directly over odbc/jdbc then most likely this will be challenging since the network is going to slow it down and there are not many odbc drivers (atleast I havent come across so far) that can do bulk loads directly to redshift. Odbc drivers  do not compress data and if you are reading millions of records then it is going to run out of buffer or the connection will fail at some point. The recommended solution for huge volumes is to upload/download data as flat files through s3 to amazon redshift.

3) Security clearance - Some companies like insurance/banks have policies over not allowing data to be uploaded to cloud. If you can get all the security clearance then it is possible to upload data to cloud in a secure and encrypted way and keep it encrypted on the cluster in such a way that even amazon cannot see it. In any case, there is some hurdles to be crossed before companies get all the internal clearance to upload data to the amazon redshift cloud. 

3) Special characters- Redshift stores data in UTF-8 characters sets. It can copy data from files in UTF-8, UTF16 formar but it converts them all into UTF-8 character set. That means some characters like those french characters with accents etc will not get loaded properly. The solution redshift is offering is to replace all of them with a single utf-8 character while copying. If any one has figured out a way to load french characters to amazon redshift, please message me. At this moment this is a big concern for me.

Also, check out: