Hive doesn’t like the carriage return character

Have you ever run in to a situation where you count the number of rows for a table in a database, then dump it to CSV and then load it to HIVE only to find that number has changed? Well, you probably have carriage returns in your fields. HIVE reads a carriage return similar to a new line which means end of row. Here’s a link I found that describes it:

http://grokbase.com/t/hive/user/111v7jva3f/newlines-in-data

You have to manually clean the \r from the file. One option is to use the unix command transliterate:

cat yourfile | tr -d "\r" > newfile
 
38
Kudos
 
38
Kudos

Now read this

Create Views over JSON Data in Hive

The beauty of storing raw JSON in HIVE is that you can potentially create multiple tables on the same data using Hive Views. Hive allows you to query JSON data using couple of different ways (json_tuple and get_json_object). The... Continue →