Removing Database Level Locks in HIVE

Recently we started noticing “CREATE TABLE” taking incredibly long amounts of time to execute and after which they’d fail. A more detailed look in to the issue revealed that we had upgraded HIVE and the new version, which now allowed ACID, would lock the database by default even if ACID support was turned off. So basically, during a SELECT or INSERT was running in HIVE, HIVE created a zookeeper SHARED lock on the entire database that contained those tables.

I did some digging through the code and found this: Notice the lock on the whole database.

To solve this problem, this link recommends to turn off locking altogether:$3501e4f0$9f05aed0$@com%3E That was not an option for us as we were doing a full replace of the table and we want to make sure no one is reading from the table when we are replacing its contents.

Another solution, although not elegant is to unlock the zookeeper lock on the schema manually every so often. Here’s a script if you decide to go that route. If you have a better way of solving this issue please tweet to me @yashranadive

require 'zookeeper'
require 'trollop'

# Removes zookeeper SHARED locks from a database
# usage ruby remove_zookeeper_locks.rb <name_of_hive_database>
# e.g. ruby remove_zookeeper_locks.rb default

opts = Trollop::options do
  opt :hive_schema, "Hive Schema to unlock", :type => :string, :required => true  # string --hive-schema <s>
  opt :zookeeper_server, "Zookeeper server", :type => :string, :required => false  # string --zookeeper-server <s>

logger =
zk =[:zookeeper_server] || "default_zookeeper_server:2181")
hive_schema = opts[:hive_schema]
path = "/hive_zookeeper_namespace_hive/#{hive_schema}"

nodes = zk.get_children(:path => path)[:children] || [] "ZK Nodes for #{path}: #{nodes}" )
locks = { |i| i[/LOCK-SHARED/] }

if (locks.nil? || locks.empty?) "No SHARED Locks found on #{hive_schema}" )
else do |lock_name|
    zk.delete(:path => "/hive_zookeeper_namespace_hive/#{hive_schema}/#{lock_name}") "Removed lock /hive_zookeeper_namespace_hive/#{hive_schema}/#{lock_name}" )

Now read this

Two Very Useful Hive Lock Settings

hive.lock.numretries hive.lock.sleep.between.retries The first property indicates the number of times the client will attempt to get a lock before it gives up. The second property indicates that time interval between retries for getting... Continue →