How to calculate Modification Times of Hive Tables

If you use external tables in hive or use methods other than Hive’s LOAD DATA to feed data to hive tables, you should be interested in how recent is your data.

Here’s a nifty little ruby snippet that allows you to get that using webhdfs

irb> require 'webhdfs'

irb> client = WebHDFS::Client.new('hadoop-nn', 50070)

irb> fl = client.list('/user/hive/warehouse/database.db/tablename/')

irb> DateTime.strptime(fl.collect {|x| x['modificationTime']}.max.to_s, '%M')
 
0
Kudos
 
0
Kudos

Now read this

Hive doesn’t like the carriage return character

Have you ever run in to a situation where you count the number of rows for a table in a database, then dump it to CSV and then load it to HIVE only to find that number has changed? Well, you probably have carriage returns in your fields.... Continue →