How to determine character encoding of files downloaded by gsutil

gsutil is Google’s tool to download reports/reviews/etc from the Developer Console.

$ gsutil ls -L gs://link/to/your/document.csv
    Creation time:      Mon, 04 Aug 2014 09:38:01 GMT
    Content-Encoding:       gzip
    Content-Length:     739977
    Content-Type:       text/csv; charset=utf-16le
    Hash (crc32c):      AAAAAA
    Hash (md5):     AAAAAAAAAAAAAAA
    ETag:           AAAAAAAAAAA
    Generation:     1234567081803000
    Metageneration:     1
    ACL:            ACCESS DENIED. Note: you need OWNER permission
                on the object to read its ACL.
TOTAL: 1 objects, 739977 bytes (722.63 KB)

Now read this

Streaming data to Hadoop using Unix Pipes? Use Pipefail

If you pipe the output of a statement to hadoop streaming you must know about the unix pipefail option. To demonstrate what it does, try this out in your commandline: $> true | false $> echo $? 1 $> false | true $> echo $? 0... Continue →