Create a file of size x bytes

One of the common requirements I run across in moving data around is finding if I’m doing it the fastest way possible. A good indicator of speed is to find out how long it takes for a large file to get copied from one server to another.

If you’re building a (big ;)) data pipeline that transports data from one server to another you better be close to the above speed. dd is a unix util that allows you to create a file of a particular size for these purposes

dd if=/dev/zero of=/path/to/desired/big/file count=1024

This will create a file 1,024 bytes in size.

 
8
Kudos
 
8
Kudos

Now read this

First Experiences with Scalding

Recently, I’ve been evaluating using Scalding to replace some parts of our ETL. Scalding is a Scala library that makes it easy to specify Hadoop MapReduce jobs. Scalding is built on top of Cascading, a Java library that abstracts away... Continue →