How to calculate Modification Times of Hive Tables

If you use external tables in hive or use methods other than Hive’s LOAD DATA to feed data to hive tables, you should be interested in how recent is your data.

Here’s a nifty little ruby snippet that allows you to get that using webhdfs

irb> require 'webhdfs'

irb> client = WebHDFS::Client.new('hadoop-nn', 50070)

irb> fl = client.list('/user/hive/warehouse/database.db/tablename/')

irb> DateTime.strptime(fl.collect {|x| x['modificationTime']}.max.to_s, '%M')
 
0
Kudos
 
0
Kudos

Now read this

Boilerplate Maven Pom for generating jars with dependencies

I find myself searching for this over and over again. Maven can be a pain in the butt and Gradle is supposed to be a huge improvement. But every now and then you have to work with Maven, here’s a useful gist which contains boilerplate to... Continue →