How to calculate Modification Times of Hive Tables

If you use external tables in hive or use methods other than Hive’s LOAD DATA to feed data to hive tables, you should be interested in how recent is your data.

Here’s a nifty little ruby snippet that allows you to get that using webhdfs

irb> require 'webhdfs'

irb> client = WebHDFS::Client.new('hadoop-nn', 50070)

irb> fl = client.list('/user/hive/warehouse/database.db/tablename/')

irb> DateTime.strptime(fl.collect {|x| x['modificationTime']}.max.to_s, '%M')
 
0
Kudos
 
0
Kudos

Now read this

Visualizing Metrics in Storm using StatsD & Graphite

Storm Metrics API # Jason Trost from Endgame has written a nice post on how to setup Storm to publish metrics using the Metrics API. Endgame has also open sourced a module storm-metrics-statsd for Storm that allows you to send messages... Continue →