msck repair table for custom partition names

msck repair table is used to add partitions that exist in HDFS but not in the hive metastore.

However, it expects the partitioned field name to be included in the folder structure:
year=2015
|
|_month=3
|
|_day=5

Notice the partition name prefixed with the partition. This is necessary. msck repair table wont work if you have data in the following directory structure:
2015
|
|_3
|
|_5

This is kind of a pain. The only solution is to use alter table add partition with location.

ALTER TABLE test ADD PARTITION (year=2015,month=03,day=05) location ‘hdfs:///cool/folder/with/data’;

 
36
Kudos
 
36
Kudos

Now read this

Streaming data to Hadoop using Unix Pipes? Use Pipefail

If you pipe the output of a statement to hadoop streaming you must know about the unix pipefail option. To demonstrate what it does, try this out in your commandline: $> true | false $> echo $? 1 $> false | true $> echo $? 0... Continue →