How to determine character encoding of files downloaded by gsutil

gsutil is Google’s tool to download reports/reviews/etc from the Developer Console.

$ gsutil ls -L gs://link/to/your/document.csv
    Creation time:      Mon, 04 Aug 2014 09:38:01 GMT
    Content-Encoding:       gzip
    Content-Length:     739977
    Content-Type:       text/csv; charset=utf-16le
    Hash (crc32c):      AAAAAA
    Hash (md5):     AAAAAAAAAAAAAAA
    ETag:           AAAAAAAAAAA
    Generation:     1234567081803000
    Metageneration:     1
    ACL:            ACCESS DENIED. Note: you need OWNER permission
                on the object to read its ACL.
TOTAL: 1 objects, 739977 bytes (722.63 KB)
 
6
Kudos
 
6
Kudos

Now read this

How to compress Data in Hadoop

Hadoop is awesome because it can scale very well. That means you can add new data nodes without having to worry about running out of space. Go nuts with the data! Pretty soon you will realize that’s not a sustainable strategy… at least... Continue →