Create Views over JSON Data in Hive

The beauty of storing raw JSON in HIVE is that you can potentially create multiple tables on the same data using Hive Views. Hive allows you to query JSON data using couple of different ways (json_tuple and get_json_object). The get_json_object allows you to pass a json string and a JSONPath to extract data. Here’s an example:

event_type event_data
user_registered {ip_address: “127.128.123.128” }
user_deleted {ip_address: “127.128.123.128” }
hive> CREATE VIEW my_view(type, value)
AS
SELECT event_type, get_json_object(tbl.event_data, '$.ip_address')
from json_talbe tbl
WHERE event_type='some_type';

hive> select * from my_view;
type value
user_registered 127.128.123.128
user_deleted 127.128.123.128
 
3
Kudos
 
3
Kudos

Now read this

Basic Monitoring for Hadoop Data Nodes

Here’s a basic monitoring script to monitor the HDFS cluster disk space, Temp Dir space and number of data nodes up. This was plenty useful before we switched to Cloudera Manager. #!/usr/bin/env ruby # Checks Hadoop and alerts if there... Continue →