Thursday, 6 August 2015

What is Informatica big data edition (BDE) ?

What is Informatica big data edition BDE ?

Informatica big data edition BDE is a product from Informatica Corp that can be used like an ETL tool for working in hadoop enviroment along with traditional RDBMS tools.  Now there are lot of ETL products in the market that makes it easier to integrate with hadoop. To name a few talend, pentaho, etc. Informatica is one of the leading ETL tool vendor and Informatica power center tool is very famous as an ETL tool and has been for many years. Traditionally this tool was used to extract transform and load data to traditional databases such as oracle, sql server, netezza to name a few. With advent of hadoop for storing peta byte volumes of data, building ETL tools that can work with hadoop became more important. It requires a lot of handcoding and knowledge to work directly with hadoop and build map reduce jobs. Hadoop tools such as hive made it easier to write SQL queries on top of Hive database and convert it to map reduce jobs. Hence, lot of companies started using Hive as a data warehouse tool and storing data in hadoop just like traditional databases and writing queries on Hive. How do we now extract , transform, and load the data in Hadoop? Thats where Informatica BDE comes into picture. It is a tool that you can use for ETL or ELT on hadoop infrastructure.  Informatica BDE can run in two modes. They are native mode and hive mode. 

In the native mode, it runs as a normal powercenter but in hive mode you can push down the whole mapping logic to hive and make it run on the hadoop cluster there by using the parallelism provided by hadoop. However there are some limitation when running in hive mode but that is more because of the limitations from hive itself. For example, hive does not allow updates in older versions.  

With Informatica BDE you can do the following at a very high level :

a) Just like any ETL tool you can do extract , transform, load between tranditional rdbms or hive/hdfs source and targets.
b) Push the whole ETL logic to hadoop cluster and make use of the map reduce framework. Basically it makes building hadoop jobs easier.
c) Makes it easy to create connection to all the different sources and integrate data from those sources. 
d) It makes it easier to ingest complex files such as JSON, XML, Cobol, AVRO, Parquest , etc.

Informatica BDE uses the Informatica developer interface to build the mappings , deploy and create applications. Anyone who has used IDQ before might be familiar with the Informatica developer interface. Informatica BDE can be found in Informatica  9.6 versions onwards.

For more detailed about hive and native mode check:

For detailed information on limitation of hive mode in Informatica BDE check: