Removing Database Level Locks in HIVE

Recently we started noticing “CREATE TABLE” taking incredibly long amounts of time to execute and after which they’d fail. A more detailed look in to the issue revealed that we had upgraded HIVE and the new version, which now allowed ACID, would lock the database by default even if ACID support was turned off. So basically, during a SELECT or INSERT was running in HIVE, HIVE created a zookeeper SHARED lock on the entire database that contained those tables.

I did some digging through the code and found this:
https://github.com/apache/hive/blob/68bc618bf0b1fd3839c3c52c2103b58719b3cb81/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java#L166 Notice the lock on the whole database.

To solve this problem, this link recommends to turn off locking altogether:
http://mail-archives.apache.org/mod_mbox/hive-user/201408.mbox/%3C0eba01cfc035$3501e4f0$9f05aed0$@com%3E That was not an option for us as we were doing a full replace of the table and we want to make sure no one is reading from the table when we are replacing its contents.

Another solution, although not elegant is to unlock the zookeeper lock on the schema manually every so often. Here’s a script if you decide to go that route. If you have a better way of solving this issue please tweet to me @yashranadive

require 'zookeeper'
require 'trollop'

# Removes zookeeper SHARED locks from a database
# usage ruby remove_zookeeper_locks.rb <name_of_hive_database>
# e.g. ruby remove_zookeeper_locks.rb default


opts = Trollop::options do
  opt :hive_schema, "Hive Schema to unlock", :type => :string, :required => true  # string --hive-schema <s>
  opt :zookeeper_server, "Zookeeper server", :type => :string, :required => false  # string --zookeeper-server <s>
end

logger = Logger.new(STDOUT)
zk = Zookeeper.new(opts[:zookeeper_server] || "default_zookeeper_server:2181")
hive_schema = opts[:hive_schema]
path = "/hive_zookeeper_namespace_hive/#{hive_schema}"

nodes = zk.get_children(:path => path)[:children] || []
logger.info( "ZK Nodes for #{path}: #{nodes}" )
locks = nodes.select { |i| i[/LOCK-SHARED/] }

if (locks.nil? || locks.empty?)
  logger.info( "No SHARED Locks found on #{hive_schema}" )
else
  locks.map do |lock_name|
    zk.delete(:path => "/hive_zookeeper_namespace_hive/#{hive_schema}/#{lock_name}")
    logger.info( "Removed lock /hive_zookeeper_namespace_hive/#{hive_schema}/#{lock_name}" )
  end
end
 
54
Kudos
 
54
Kudos

Now read this

Streaming data to Hadoop using Unix Pipes? Use Pipefail

If you pipe the output of a statement to hadoop streaming you must know about the unix pipefail option. To demonstrate what it does, try this out in your commandline: $> true | false $> echo $? 1 $> false | true $> echo $? 0... Continue →