How to determine character encoding of files downloaded by gsutil

gsutil is Google’s tool to download reports/reviews/etc from the Developer Console.

$ gsutil ls -L gs://link/to/your/document.csv
    Creation time:      Mon, 04 Aug 2014 09:38:01 GMT
    Content-Encoding:       gzip
    Content-Length:     739977
    Content-Type:       text/csv; charset=utf-16le
    Hash (crc32c):      AAAAAA
    Hash (md5):     AAAAAAAAAAAAAAA
    ETag:           AAAAAAAAAAA
    Generation:     1234567081803000
    Metageneration:     1
    ACL:            ACCESS DENIED. Note: you need OWNER permission
                on the object to read its ACL.
TOTAL: 1 objects, 739977 bytes (722.63 KB)
 
6
Kudos
 
6
Kudos

Now read this

Create Views over JSON Data in Hive

The beauty of storing raw JSON in HIVE is that you can potentially create multiple tables on the same data using Hive Views. Hive allows you to query JSON data using couple of different ways (json_tuple and get_json_object). The... Continue →