How to calculate Modification Times of Hive Tables

If you use external tables in hive or use methods other than Hive’s LOAD DATA to feed data to hive tables, you should be interested in how recent is your data.

Here’s a nifty little ruby snippet that allows you to get that using webhdfs

irb> require 'webhdfs'

irb> client ='hadoop-nn', 50070)

irb> fl = client.list('/user/hive/warehouse/database.db/tablename/')

irb> DateTime.strptime(fl.collect {|x| x['modificationTime']}.max.to_s, '%M')

Now read this

Few Thoughts about Learning

It is funny how we have so much information available to us but nobody teaches us how to learn. In college, I struggled with processing vast amounts of information. I would read an article/paper/concept and comprehend only some part of... Continue →