Monday, 14 July 2014

The power of IBM Netezza appliance

Netezza is a powerful data warehouse appliance that can handle gigabyte volume of data and used widely is many medium and large scale organizations. Some very interesting facts about Netezza is below:

Good stuff:
1) Netezza is indeed very powerful for OLAP and ETL purpose. The configuration I'm used to can handle queries on tables that have billions of records.  Netezza uses asymetric massive parallel processing and have multiple processing units to make query processing faster and run in parallel. Ofcourse, proper joins and distribution keys have to be set  when dealing with billions of records.

2) No tedious performance tuning required like in oracle or sql server. i.e. stuff like adding indexes, creating partitioning etc is not required. It allows adding primary keys but does not enforce that constrain and leaves it to the tools that loads data into netezza. Only distribution keys have to be set properly on the tables.

3) The above reduces the administration effort. Apparently, Netezza is promoted more like a no DBA tool. No software needs to be installed and comes installed with the netezza appliance.

4) Migration of data from one environment to another is very easy using tools such as nz_migrate, nz_sql, nzload.

5) NZ_Migrate is a handy tool to migrate bunch of tables  in one shot to another environment along with creation of the tables in the new environment. NZ_Migrate and NZ_UNLOAD are multi-threaded and can load /unload data using many threads.

6) Bulk loads are very fast. Netezza has its own Powerconnect driver to power these operations.

7) NA admin tool provides a good interface to administer the tool and as well see the resource availability and active/inactive queries.

Check also: Extract, Load, Migrate in Netezza:

Some not so great stuff:

1) In earlier version, if we have to add a column then we have to drop a table and recreate it.

2) Same thing with views. We had to drop a view if the column data type changes in the base table.

3) It does not support sub queries in the select statement though this is not a big problem in ETL tools. Not sure if this is supported is latest versions.

4) How scalable is netezza? What to do when the volume of data or number of queries that need to run increase to a point where existing netezza appliance cannot support? This might mean buying more hardware to upgrade netezza. Compared to solutions like Hadoop or Amazon redshift, netezza might be not be very scalable.