Hive doesn’t like the carriage return character

Have you ever run in to a situation where you count the number of rows for a table in a database, then dump it to CSV and then load it to HIVE only to find that number has changed? Well, you probably have carriage returns in your fields. HIVE reads a carriage return similar to a new line which means end of row. Here’s a link I found that describes it:

http://grokbase.com/t/hive/user/111v7jva3f/newlines-in-data

You have to manually clean the \r from the file. One option is to use the unix command transliterate:

cat yourfile | tr -d "\r" > newfile
 
37
Kudos
 
37
Kudos

Now read this

Create a file of size x bytes

One of the common requirements I run across in moving data around is finding if I’m doing it the fastest way possible. A good indicator of speed is to find out how long it takes for a large file to get copied from one server to another.... Continue →