How to calculate Modification Times of Hive Tables

If you use external tables in hive or use methods other than Hive’s LOAD DATA to feed data to hive tables, you should be interested in how recent is your data.

Here’s a nifty little ruby snippet that allows you to get that using webhdfs

irb> require 'webhdfs'

irb> client = WebHDFS::Client.new('hadoop-nn', 50070)

irb> fl = client.list('/user/hive/warehouse/database.db/tablename/')

irb> DateTime.strptime(fl.collect {|x| x['modificationTime']}.max.to_s, '%M')
 
0
Kudos
 
0
Kudos

Now read this

Creating Presentations with Reveal.js

Late last year, I gave a talk at the Sift Science office in San Francisco on “Hadoop at Lookout - how Lookout uses the hadoop infrastructure to power internal analytics”. I used Reveal.js to present the talk in my browser! Reveal.js is a... Continue →