Streaming data to Hadoop using Unix Pipes? Use Pipefail

If you pipe the output of a statement to hadoop streaming you must know about the unix pipefail option. To demonstrate what it does, try this out in your commandline:

$> true | false
$> echo $?
1
$> false | true
$> echo $?
0

ZOMG WTF why is that 0, the first command failed so the output of the entire command should be 1, no? By default, the return status of a pipeline is the return status of the last command. So if you have something like this:

$> mysql -u user -p password -e "Select * from sometable" | hadoop dfs -put - /somefile/on/the/cluster

The exit code will be 0 even if the mysql command fails. You can force the return status of the pipeline to be 1 if any command in the pipeline fails.

$> set -o pipefail;mysql -u user -p password -e "Select * from sometable" | hadoop dfs -put - /somefile/on/the/cluster
 
22
Kudos
 
22
Kudos

Now read this

MySQL Replication Slave Monitoring Script for Zenoss

To monitor a slave you need to check if Slave IO Thread is running (Alert when not running) Slave SQL Thread is running (Alert when not running) Seconds behind master (Alert when passes a certain threshold) Zenoss expects the output of... Continue →