Sunday, 30 November 2014

How to truncate or delete records in a apache HIVE table and remove files from HDFS?

How to truncate or delete records in a apache HIVE table and remove files from HDFS?

As you know in apache Hive, the data is stored as files on HDFS and the hive tables are just meta data that is used to read data from those files in a table format.

So when you want to truncate a table in Hive, you are basically removing all the files that are on HDFS for that table. 

Try the below command:

Hive>  use mydatabase;
Hive> truncate table mytable;

In the above command mydatabase and mytable are the example database and table name. Replace it with your database and table name.

If in case the above commands do not work, then you can go and delete the files that these tables are referring to from hdfs.

FYI: If you want to update records then check out this LINK (click HERE)

To find the location of the files, go to your name node UI and check for the path where the files for the table are located using the file browser.
The name node UI link is usually in the format  http://<namenode_host>:50070/ .In a multinode cluster your namenode might be different from your localhost so find out the name node link from your admin.

In my example, the files for my table are stored in /user/hive/warehouse/mydatabase/mytable. 

Now use the below hadoop command to delete all the files belonging to that table from hdfs. This should basically truncate the table in Hive.

hadoop fs -rm /user/hive/warehouse/mydatabase/mytable/*

If for instance you want to delete only few files from dhfs then you can use the same command where instead of the wildcard you will use the file name. This is equivalent to deleting records in Hive.  For example:

hadoop fs -rm /user/hive/warehouse/mydatabase/mytable/file1.txt

Hope this helps. If any questions, contact me.

Check also the below link to load files to Hive tables: