Monday, 14 July 2014

Open Source Data Warehousing and BI world.

If we have to design a data warehousing architecture with all open source systems then checking the below alternatives could be a good start:

Open source databases:

MySQL: probably the most heard of database and preferred for OLTP application and backends for many web servers. Can run on multiple operating systems.

PostgreSQL: Apparently more advanced than MySQL and can run provide better performance for huge queries.

Hadoop/Hive: Recently Hadoop/big data is the buzzword in data warehousing world. Hadoop from Apache can be a good alternative to store huge amount of data and run batch processing on huge volumes of data. Hadoop is very scalable and you can scale it with commodity hardware. Hive can be used for data warehousing, PIG for storing data along with some ETL processing, an similarly HBASE for storing and manipulating data.

Open source ETL tools:

Pentaho Kettle: has a bigger user community than other open source tools and a good GUI for ETL and data quality.
Talend and CloverETL are apparently both eclipse based tools and generate Java code for ETL. Both have GUI support.
Pentaho and talend can be integrated with big data.

Open Source Reporting tools:

Jaspersoft, BIRT, Pentaho BI suits are some of open source reporting tools.