Showing posts with label Home. Show all posts
Showing posts with label Home. Show all posts

Friday, 8 July 2016

Performance tuning of Informatica Big Data Edition Mapping

Performance tuning of Informatica Big Data Edition Mapping



Below are the list of performance tuning steps that can be done in Informatica Big Data Edition:


1)  When using a look up transformation only when the lookup table is small. Lookup data is copied to each node and hence it is slow.


2) Use Joiners instead of lookup for large data sets.


3) Join large data sets before small datasets. Reduce the number of times the large datasets are joined in Informatica BDE.


4) Since Hadoop does not allow updates, you will have to rebuild the target table whenever the record is updated in a target table. Instead of rebuilding  the whole table, consider rebuilding only the impacted partitions.


5) Hive slower with any non string data type. It needs to create temp tables to do the conversion to and from the non string data type to string data type. Use non string data type only when required.


6) Use the data type precision close the actual data. Using higher precision slows down the performance of Informatica BDE.


7) Map only the ports that are required in the mapping transformation or loaded to target. Less number of ports means better performance and less data reads.









Thursday, 7 July 2016

Useful Queries for troubleshooting amazon redshift

USEFUL QUERIES FOR TROUBLESHOOTING IN AMAZON REDSHIFT 

Here are some of my queries for troubleshooting in amazon redshift. I have collected this from different sources.

TO CHECK LIST OF RUNNING QUERIES AND USERNAMES:

select a.userid, cast(u.usename as varchar(100)), a.query, a.label, a.pid, a.starttime, b.duration,
b.duration/1000000 as duration_sec, b.query as querytext
from stv_inflight a, stv_recents b, pg_user u
where a.pid = b.pid and a.userid = u.usesysid




select pid, trim(user_name), starttime, substring(query,1,20) from stv_recents where status='Running'

TO CANCEL A RUNNING QUERY:

cancel <pid>


You can get pid from one of the queries above used to check running queries.


TO LOOK FOR ALERTS:

select * from STL_ALERT_EVENT_LOG
where query = 1011
order by event_time desc
limit 100;


TO CHECK TABLE SIZE:

select trim(pgdb.datname) as Database, trim(pgn.nspname) as Schema,
trim(a.name) as Table, b.mbytes, a.rows
from ( select db_id, id, name, sum(rows) as rows from stv_tbl_perm a group by db_id, id, name ) as a
join pg_class as pgc on pgc.oid = a.id
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
join pg_database as pgdb on pgdb.oid = a.db_id
join (select tbl, count(*) as mbytes
from stv_blocklist group by tbl) b on a.id=b.tbl
order by b.mbytes desc, a.db_id, a.name;


TO CHECK FOR TABLE COMPRESSION:

analyze <tablename>;
analyze compression <tablename>;



TO ANALYZE ENCODING:

select "column", type, encoding
from pg_table_def where tablename = 'biglist';



TO CHECK LIST OF FILES COPIED:

select * from stl_load_errors

select * from stl_load_commits


select query, trim(filename) as file, curtime as updated, *
from stl_load_commits
where query = pg_last_copy_id();


TO CHECK LOAD ERRORS

select d.query, substring(d.filename,14,20),
d.line_number as line,
substring(d.value,1,16) as value,
substring(le.err_reason,1,48) as err_reason
from stl_loaderror_detail d, stl_load_errors le
where d.query = le.query
and d.query = pg_last_copy_id();


TO CHECK FOR DISKSPACE USED IN REDSHIFT:

select owner as node, diskno, used, capacity
from stv_partitions
order by 1, 2, 3, 4;
select query, trim(querytxt) as sqlquery
from stl_query
order by query desc limit 5;


SOME IMPORTANT AWS COMMANDS:

To resize the redshift cluster (node type and number of nodes always required):

aws  redshift modify-cluster --cluster-identifier <cluster name> --node-type dw2.8xlarge --number-of-nodes 3

To get filelist on S3:

aws s3 ls $BUCKET/  > ./filecount.out

To get status of cluster and other information of cluster in text format:

aws redshift describe-clusters --output text   


Friday, 2 October 2015

How to use SQLPlus to connect to oracle and use it in unix scripts? Connect strings

How to use SQLPlus to connect to oracle and use it in unix scripts? SQL Plus Connect strings and commands :

Sqlplus is a command line tool that comes with any oracle client that can be used to connect to oracle database. It can also be used in scripts to send commands to oracle database.

The common sqlplus connection string is:

sqlplus username/password@servername:port/databasename 

Go to the directory where sqlplus is location and enter the above command with proper username, password supplied to the command as shown in the connection string.

To find list of tables in the database use the command in sqlplus:

select table_name from user_tables;
 
Rest of the DML and DDL statements from oracle work in SQL plus.

If you want to use sqlplus in an unix script then the below sample unix script should help:

#/usr/bin/ksh
sqlplus username/password@servername:port/databasename << EOF
WHENEVER SQLERROR EXIT 2;
insert into mytable select * from mytemptable;
EOF
RTN=$?
if [ $RTH -ne 0] ; then
echo failed




Informatica Java transformation to parse comma separated columns and generate new rows - normalizer

Informatica Java transformation to parse comma separated columns and generate new rows - normalizer limitations

Informatica normalizer transformation is normally used to convert a column containing multiple values into separate rows. i.e something like ID, 1,2,3,4 can be converted to something like below.
ID , 1
ID, 2
ID, 3
ID, 4

The problem with normalizer however is that it has to know the number of occurences. If the number of occurences are very high then it is hard to create the output ports for that many number of ports.

The easy way in these cases is to use Java transformation to parse the column containing multiple values and generate a separate row for each of them:

The java code is shown below:

String str_var= INPUT_PORT_NAME
String[] arr;
String delimiter = ",";
arr=str_var.split(delimiter);
for (int i=0; i<arr.length;i++) {
INPUT_PORT_NAME=arr[i];
generateRow();
}


Wednesday, 7 January 2015

How to set up and sync Fitbit Flex Activity Tracker and Review

How to set up and sync Fitbit Flex Activity Tracker? Review at end of this article.




1) Unbox the fitbit package. You should have the below parts in it.


2) Charge the flex chip (the small chip on the left side of the pic above) by inserting into cable that has USB plug at the end (the one in the top of the pic above) and connecting the cable to USB port of your computer. Also, you can connect it to USB charger that plugs into power slots. Once fully charged all the four light indicators on the chip turns on.


Creating fitbit account and the software

3) Now go to http://www.fitbit.com/setup and download the software for the flex model and install the software on your computer. If you have a model other then flex then download the software for that mode. 

4)  Open your Fitbit Connect application. You can set up your device and create a fitbit account by clicking on the Set up a New Fitbit Device link.



5)Click on the New to Fitbit link and then set up your account. You will use this account to login to fitbit.com and see all your activities.



6) After you have set up your account, you need to track your activity with fitbit. Insert the fitbit chip in your fitbit wristband. The device will start tracking your movement in terms of steps taken. Basically every time you move your hand, walk, run etc.


Syncing your fitbit

7) Now to see the activities tracked by your fitbit, you need to sync your fitbit device with your fitbit account. To do this, you insert the fitbit dongle to USB port on your computer.


8)  Start the fit bit connect application and click on the sync button. You need to have your fitbit wristband along with the chip close to the dongle for it to detect and read your activities from the tracker.


9) After sync is complete, click on Go to fitbit.com link or open fitbit.com and login using the account created in one of the steps above. This should take you into account dashboard that will show all the activities tracked in the day by hour.



 10) In your dashboard, you can see the number of steps and calories burnt in the day by hour.


11) That should be it. Sync your activities every now and then. Also, keep your fitbit tracker chip charged once every 2-3 days.

12) If you have android or Iphone, you can install the fitbit app and that app should take care of syncing your activities without having to use the dongle etc.

Review:
a) Easy to set up, sync and check the daily stats. A nice and informative dashboard that gives counts of steps, calories burnt and a graph that plots number of steps by time.
b) LED indicators to indicate the amount of battery power left.
c) A bit hard to wear because of the clip that is used to lock the strip. We have to press it hard for it lock. I would have liked if it was more like a normal watch band. May be it is designed this way so that it hugs the wrist tightly.
d) Comes with two wristbands of different sizes. 
e) Very light weight. Looks stylish.
f) Dongle helps to sync using normal PC. Apps available for Android and Iphone.

Tuesday, 6 January 2015

Infacmd command to enable, disable, and recyle Informatica service

Infacmd command to enable, disable, and recyle Informatica service 

To disable an Informatica service use the below command:(The variables starting with $ should be defined in your environment or in your script)

$INFA_HOME/server/bin/infacmd.sh isp disableService ­dn $DOMAIN_NAME ­un $DOMAIN_USER ­pd $DOMAIN_PASSWORD ­sn $Service ­mo stop

 To get status of an Informatica service use the below command:

 $INFA_HOME/server/bin/infacmd.sh getServiceStatus ­dn $DOMAIN_NAME ­un $DOMAIN_USER ­pd $DOMAIN_PASSWORD ­sn $Service


To enable an Informatica service use the below command:
 
$INFA_HOME/server/bin/infacmd.sh enableService ­dn $DOMAIN_NAME ­un $DOMAIN_USER ­pd $DOMAIN_PASSWORD ­sn $Service


To recycle Informatica service using Infacmd (command line) then you have disable and enable a service using the commands above.








Monday, 8 September 2014

How to generate SSH keys and use it for Sftp / SSH/ SCP?

How to generate SSH keys and use it for Sftp / SSH/ SCP?

SSH keys are commonly used during sftp and scp to authourize access to a host machine. Instead of using a password, ssh keys are used to identify and authourize a machine to login to a target machine. It is more safe and hard to breakin.

In brief, the steps to set up ssh authetication would be as follows:

Step a: On any source/client machine, use the command below to generate the ssk keys:

ssh-keygen -t rsa -f <nameofthekeyfile>

Example: ssh-keygen -t rsa -f mykeys-for-targetmachine

The command will prompt for keys size, expiry date, passphase, etc. You can enter the values you deem fit for your purpose.The above command will create a private and pub key with the key name mykeys-for-targetmachine you have provided in the command. The public key file will have .pub extention.

Step b: Now open the public key file and copy the contents of the key file.

Example:
cat mykeys-for-targetmachine.pub

The output will look something like this:
ssh-rsaAAAAB3NzaC1yc2ASAGSAHGSJAHSGJASGAJSG38fxq8VHDwNRP/asJHGJHGJGaasasaJGJJGJHGJHJx305gH3XKZA3
asdasdaLLHJLLJLJLJLJLLKJLJLasdsadsadjlkjlajdsadl= myuser@targetmachine.com

Step c: Login to the target machine that you want to access and open the authorised_keys file. This file should be in your home directory.Paste the contents of the public file that you copied in step b to this file.

Step d: Back to the source/client machine, change the permission of the keys to make it secure.
For the above example, it would be:

chmod 600 mykeys-for-targetmachine


Step e: Now try to SSH to target machine using your new keys to test the connection: The command would be something like the below (replace the <myuser> with the user name requierd on the target machine and the <targetmachine> with the name of the target machine that you want to login i.e. the machine to with the authourized key file that you just updated in step c:

ssh -i mykeys-for-targetmachine <myuser>@<targetmachine>

it will prompt first to add the target machine to the list of known host, press yes and you should be now logged into the machine. If it prompts for a password then you haven't done something right. Check the above steps and do the required correction.

How to sftp and scp using ssh keys? Below are the commands to use:

sftp -i mykeys-for-targetmachine <myuser>@<targetmachine>
scp -i  mykeys-for-targetmachine <myuser>@<targetmachine>



More information can be found in the below link:
https://help.ubuntu.com/community/SSH/OpenSSH/Keys

Wednesday, 16 July 2014

A Typical ETL (Extract, Transform, Load) Architecture for Data Integration

A Typical ETL (Extract, Transform, Load) Architecture for Data Integration


It is pretty much a common question in interviews to ask about ETL architecture. Let this not be confused with the various data warehouse architectures which is a bigger universe and we will discuss about data warehouse architecture in our next post. The diagram below is simple illustration of the ETL architecture. The ETL tool used here can be Informatica or any other tool such as SSIS or Data Stage.

The diagram below shows that ETL as a data integration tool can pull data from multiple sources. The source can be flat files, VSAM, web services, XML, or relational databases such as Oracle, DB2,SQL server etc.
 
Landing Area: The data from these sources in a normal ETL architecture is loaded first into a landing area. This landing area will be table(s) in some database that holds the source data as is. Not many transformations is applied to data while being loaded into landing area. If this involves daily loads, the batch keys can be used to identify the load for a particular day. Once the data is in the landing area, various transformations and business logic can be applied and data moved to staging area.

Staging Area: The staging area contains data to which the application logic is applied. Data from a single table or multiple tables from landing area can be combined to create table(s) in the staging area. The data is transformed, aggregated, filtered, and all the business logic applied before loading the data into staging area. For example: If there are daily loads and if the data from the latest batch updates the historical data, then the staging area is supposed to hold the most latest version. If data from multiple columns in landing table are used to derive a column for a business, then this transformation is done in the mappings that loads data to the staging area.

There can be instances where landing and staging are merged into one table if separating them is not required. Having a separate landing area helps to recreate the staging area any time from all the data in the landing area. So that means that landing area has all the historical data which is data from the first load upto to the latest batch of data.

Datawarehouse Area: Call it the data warehouse or datamart area is the place that business users and the reporting layer have access to. The data in the data mart is the final version of the data. Usually data from staging to datamart is passed with minimal transformation. It is a good practice to make data in datamart recoverable any time from the data in staging.Landing and staging are intermediate areas before the data lands up in datamarts.

This is quiet simple isn’t it? If any questions, drop a comment below.


See also: http://dwbitechguru.blogspot.ca/2014/07/business-intelligence-simple-definition.html

Tuesday, 15 July 2014

SAP Business Object (BO) Reports Testing

The testing of business object reports or in general any report is discussed below:

The following testing is required for business object reports

1) Data validation: this is to test if the report displays the data that is required of it. The data is formatted properly with cross tabs, blocks, sections, filters, etc.

2) The format of the report is correct i.e. the orientation, font size, page size, page orientation.

3) Check the print settings. Print the report to check if it prints property. Check if the report can be saved as excel/html/pdf format.

4) Regression test by comparing to older version of the report.

How to test the reports:
1) Scope definition: Testing scope varies depending on the changes involves i.e. if it is a new report or updates to existing report.

2) Regression testing: If is updates to existing report then refresh the older version of the report to compare that the changes have not impacted the existing report.Microsoft excel or beyondcompare are some of the tools that are used to compare two reports and check for any discrepencies.

4) Data validation: SQLs and eye ball testing can be used to validate the report data. Take sample of data and test it against operational/data warehouse data to check the data is accurate. 

5) Universe testing: If testing scope involves testing the universe, then all the objects and classes that are required as per BRD can be checked to check they exist. Also, existence of any loops and any universe validation issues can be checked and resolved.

6) Implementation testing: After implementation, the report can be refreshed in UAT/prod environment to check if it is still working and displaying data. Check if the report is getting refreshed as per schedule and the report is getting published/emailed as per requirements.

See Also: http://dwbitechguru.blogspot.ca/2014/07/etl-testing-challenges.html

Monday, 14 July 2014

Informatica - ETL testing -Challenges


ETL testing is a bit complicated since the complexity of testing cannot be hidden behind button or GUIs. The tester will have write SQLs on variety of databases, some times write own mappings to test the ETL code since ETL involves integration of data from heterogeneous data sources and transformation that cannot be easily coded using SQLs.

Things that might have to be tested during ETL testing:

1) The data from multiple sources is properly integrated, transformed, aggregated, sorted etc and loaded into the target database.
2) The data is properly landed, staged and loaded into mart tables.
3) Proper batch keys are inserted for delta loads and delta loads are working properly.
4) The data is inserted/updated in the target table.
5) Historical loads and daily loads are working as expected.
6) Test of events, commands tasks, schedule and notification.
7) Validation of error cases and rejected data.
8) Performance testing
9) Impact testing: Testing interfaces to upstream and downstream processes.
10) Regression testing: test if the existing data/tables/etc are not broken.

Complication in ETL testing:

a) Availability of all test data and creation of test data for all the test cases.
b) Understanding the complexities of the ETL tool and data warehousing.
c) Working with data spread across multiple databases.

Just out of interest, If you want to check out different roles in data warehousing, please read:
http://dwbitechguru.blogspot.ca/2015/09/career-and-hot-jobs-in-data-warehousing.html
How ETL testing can be done:

a) If you are testing informatica code and if testing involves only a few impacted sessions, a separate workflow can be created just for the purpose of testing. This avoids searching for the right session to run to test the code.

b) Perform a run of the ETL jobs for historical data and for daily loads separately and test both loads.

c) Test the load of data into landing area. If this is working properly, then it is easy to write SQLs to validate data against landing area since data from multiple sources are brought into one landing area.

d) If jobs involve files, make sure the files are deleted/archived and folders ready for next load.

e) Look at the ETL design document to understand how ETL is designed.

f) Make sure the ETL code can handle the data volume and meet all the performance parameters.

g) There might be some ETL testing tools that could make testing easier. Informatica has a set of etl testing tools of their own. 

See also:
http://dwbitechguru.blogspot.ca/2014/07/business-object-reports-testing.html

The power of IBM Netezza appliance

Netezza is a powerful data warehouse appliance that can handle gigabyte volume of data and used widely is many medium and large scale organizations. Some very interesting facts about Netezza is below:

Good stuff:
1) Netezza is indeed very powerful for OLAP and ETL purpose. The configuration I'm used to can handle queries on tables that have billions of records.  Netezza uses asymetric massive parallel processing and have multiple processing units to make query processing faster and run in parallel. Ofcourse, proper joins and distribution keys have to be set  when dealing with billions of records.


2) No tedious performance tuning required like in oracle or sql server. i.e. stuff like adding indexes, creating partitioning etc is not required. It allows adding primary keys but does not enforce that constrain and leaves it to the tools that loads data into netezza. Only distribution keys have to be set properly on the tables.

3) The above reduces the administration effort. Apparently, Netezza is promoted more like a no DBA tool. No software needs to be installed and comes installed with the netezza appliance.

4) Migration of data from one environment to another is very easy using tools such as nz_migrate, nz_sql, nzload.

5) NZ_Migrate is a handy tool to migrate bunch of tables  in one shot to another environment along with creation of the tables in the new environment. NZ_Migrate and NZ_UNLOAD are multi-threaded and can load /unload data using many threads.

6) Bulk loads are very fast. Netezza has its own Powerconnect driver to power these operations.

7) NA admin tool provides a good interface to administer the tool and as well see the resource availability and active/inactive queries.

Check also: Extract, Load, Migrate in Netezza:
http://dwbitechguru.blogspot.ca/2014/11/extract-load-migrate-filesdata-from.html

Some not so great stuff:

1) In earlier version, if we have to add a column then we have to drop a table and recreate it.

2) Same thing with views. We had to drop a view if the column data type changes in the base table.

3) It does not support sub queries in the select statement though this is not a big problem in ETL tools. Not sure if this is supported is latest versions.

4) How scalable is netezza? What to do when the volume of data or number of queries that need to run increase to a point where existing netezza appliance cannot support? This might mean buying more hardware to upgrade netezza. Compared to solutions like Hadoop or Amazon redshift, netezza might be not be very scalable.

Informatica ETL tool capabilities

If you are designing a solution and need to know when you can use Informatica or for that matter any ETL tool, the below list can be useful:

a) Data Integration: Informatica is a data integration tool .i.e. it can pull data from multiple heterogenous sources and extract, tranform, and load the data to again another set of data targets.
The source/target database can be mysql, oracle, sql server, Netezza, Db2, Flat files, etc.

The source can also be web service that provides data in XML format. Informatica can also generate files in XML format.

The source can be the main frame VSAM files generated by main frames.

There can be data also read from cloud such as Amazon red shift. Big data integration are also available.

b) Informatica is pretty good in reading from flat files and loading data into database or converting it to flat files or xml feeds.

c) Informatica is a tool that can be used to apply lot of transformation logic. This tranformation logic could be something simple like changing a data type or doing some arithmetic using data, aggregating, sorting, ranking, normalizing, etc or complex such as xml parsing.

d) Informatica workflows can also be called as web service i.e. you can create web services from Informatica.

e) Informatica can be used to execute commands on the host machine for any purpose such a sftpign a file, moving a file, etc. It can wait on events such a touch files or completion of jobs.

f) Informatica comes with a basic scheduler that can be used to schedule the jobs. Also, provides various notification options for failures, success, job completion etc using emails etc.

g) Informatica can be used to sequence jobs, apply conditions between each job i.e. it can be used to create workflows where each job gets executed one after another in a defined sequence.

h) Informatica can be used to create operation data warehouses, datamarts, star schemas, slowly changing dimensions, type1/type2/type3 dimensions etc/.

Other tools from Informatica such as IDQ and MDM can be used for address validation and master data management respectively. Informatica cloud allows pushing jobs to the cloud.

Informatica might not do when when do..while or for ..loop kind of logic is required to handle data. However using java transformations or other workaround this kind of logic can be implemented in Informatica.

See also: http://dwbitechguru.blogspot.ca/2014/07/performance-tuning-in-informatica.html

Open Source Data Warehousing and BI world.

If we have to design a data warehousing architecture with all open source systems then checking the below alternatives could be a good start:

Open source databases:

MySQL: probably the most heard of database and preferred for OLTP application and backends for many web servers. Can run on multiple operating systems.

PostgreSQL: Apparently more advanced than MySQL and can run provide better performance for huge queries.

Hadoop/Hive: Recently Hadoop/big data is the buzzword in data warehousing world. Hadoop from Apache can be a good alternative to store huge amount of data and run batch processing on huge volumes of data. Hadoop is very scalable and you can scale it with commodity hardware. Hive can be used for data warehousing, PIG for storing data along with some ETL processing, an similarly HBASE for storing and manipulating data.

Open source ETL tools:

Pentaho Kettle: has a bigger user community than other open source tools and a good GUI for ETL and data quality.
Talend and CloverETL are apparently both eclipse based tools and generate Java code for ETL. Both have GUI support.
Pentaho and talend can be integrated with big data.

Open Source Reporting tools:

Jaspersoft, BIRT, Pentaho BI suits are some of open source reporting tools.


Sunday, 13 July 2014

Performance Tuning in Informatica

Performance Tuning in Informatica:


The following can be done to improve the performance of Informatica mappings and sessions:

Lookups:

  a) Reduce the number of lookup transformations:
  b) In the look up transformation override the SQL to select only the columns that are required.
  c) Also the lookup conditions should help retrieve less number of rows.
  d) Caching look up also improves the performance. Use static cache or persistant cache.
  e) Index the lookup tables if indexing is supported. Some data warehouse appliance like netezza do not need indexing.

Joiners:

 a) Joiners take lot of memory and it is good to reduce the number of joiner transformations in the mapping.
 b) Use join in the SQL in the source qualifier and use Joiner transformation only when reading tables from two different databases and if those tables have to be joined.

Source Qualifiers and pre/post SQL :

 a) It is a good idea to put as much logic as possible in the Source Qualifier SQL overide if you are using data warehouse appliance such as Netezza. The reason being these appliances are very powerful and support parallelism at the database level which improves the performance of the  reader.
 b) You can complex SQL logic in the pre/post SQL if you are again using appliances such as netezza. The more you can reduce the data traffic flowing to the informatica server, the better will be the performance of the informatica tasks.

Aggregator transformation:

 a) Use sorted input for aggregator  transformation.
 b) As much as possible avoid aggregator transformation and try to use aggregation at the SQL in the source qualifier.

Pushdown optimization:

If you have license for pushdown optimization then use pushdown optimization to improve performance.

If you do not have license, best way to achieve pushdown optimization is to move the logic to data base using preSQL or postSQl in the sessions.

For example: If you are inserting large amount of data into a table then some logic like below in the preSQL will improve the performance. Put this code in the preSQL of the session-source.

Insert into targettable (select col1, col2, sum(col3) from srctable1, srctable2 where srctable1.col1=srctable2.col2 group by col1, col2)

At the session level, the following settings can improve the performance:


1)  Increase the number of partitions if possible to partition and read the data.
2)  Increase the cache memory.
3)  Do not go for verbose logging. The number of logged records must be less. The number of bad records should also be few. If there are lot of bad records that can impact the performance too.
4) Reduce the number of source and target connections if possible. If reading multiple tables from same database use one connection and join the tables in the source qualifier.
5) Try to use bulk writers if possible. Bulk writers provide higher write speeds because it does not log every write operation.


Other parameters, the influence informatica performance are:


1) Network bandwidth and the traffic on our network.
2) Informatica server capacity and other jobs running on the server.
3) The driver you are using. This might be a factor if reading from cloud such as amazon redshift.
4) Running multiple instances of the same workflow.

Thursday, 10 July 2014

Difference between Waterfall vs Agile Methodology

Difference between Waterfall vs Agile Methodology


Waterfall is a more sequential SDLC methodology where requirement gathering, coding, testing, UAT , implementation phases happen one after another. Each phase has to complete before another phase starts.

Agile methodology is focused on delivery of smaller incremental release of working software that is more responsive to changes, and allows users to use the software earlier than waterfall model.



Agile development methodologies:


Each methodology interprets the agile principles slightly differently

List of methodologies:
   a) Extreme Programming (XP)
   b) Scrum
   c) Lean and Kanban

   d) Feature Drive Development
   e) Adaptive System Development
   f) Agile Unified Process and Essential Unified Process
   h) Crystal and Dynamic Systems Development Method

Factors influencing success of agile:

a) Team collaboration and willingness to adapt to change.
b)  Push by the management to train and help the employess adapt. Having a trained agile consultant could help.
c)  Bringing the business users on aboard with this methodology and involving them early on.
d) Type of project and criticality also play a role.
e) Smaller teams and co located teams are better for agile methodologies since it is easier to cordinate.

See also:
Scrum :

http://dwbitechguru.blogspot.ca/2014/11/agile-development-methodologies-scrum.html 


http://dwbitechguru.blogspot.ca/2014/07/data-warehouse-and-business.html