Hive doesn’t like the carriage return character

Have you ever run in to a situation where you count the number of rows for a table in a database, then dump it to CSV and then load it to HIVE only to find that number has changed? Well, you probably have carriage returns in your fields. HIVE reads a carriage return similar to a new line which means end of row. Here’s a link I found that describes it:

http://grokbase.com/t/hive/user/111v7jva3f/newlines-in-data

You have to manually clean the \r from the file. One option is to use the unix command transliterate:

cat yourfile | tr -d "\r" > newfile
 
38
Kudos
 
38
Kudos

Now read this

How to read ISO 8601

ISO 8601 is a format of expressing a date with timezone information. I used to get confused after looking at dates like “2014-10-07T16:11:24-07:00”. Ok so you can tell it is 7th October 2014 and 4:24 PM. The -07:00 tells us the timezone... Continue →