Create a file of size x bytes

One of the common requirements I run across in moving data around is finding if I’m doing it the fastest way possible. A good indicator of speed is to find out how long it takes for a large file to get copied from one server to another.

If you’re building a (big ;)) data pipeline that transports data from one server to another you better be close to the above speed. dd is a unix util that allows you to create a file of a particular size for these purposes

dd if=/dev/zero of=/path/to/desired/big/file count=1024

This will create a file 1,024 bytes in size.

 
8
Kudos
 
8
Kudos

Now read this

Basic Monitoring for Hadoop Data Nodes

Here’s a basic monitoring script to monitor the HDFS cluster disk space, Temp Dir space and number of data nodes up. This was plenty useful before we switched to Cloudera Manager. #!/usr/bin/env ruby # Checks Hadoop and alerts if there... Continue →