Hadoop, Hadoop Hive

Lipwig for Hive Is The Greatest!

Making_Money_LipwigOk, this is the coolest thing this Hive user has seen all day.

As you probably know, if you prepend the word EXPLAIN to your SQL query and then run it, Hive prints out a text description of the query plan. This lets you explore the effects such variations as code changes, the use of analyze, turning on/off the cost-based optimizer (CBO), and so on. It’s an essential tool for optimizing Hive.

The output of EXPLAIN is far from pretty, but fortunately, a simple pipeline of Linux commands can give you a slick graphical rendition like the one below.

By the way, I’m doing this on the Hortonworks Sandbox for HDP 2.3, but you can do exactly the same thing for any Linux installation of Hive/Hadoop. The Sandbox is a free VM image of a single-node installation the Hortonworks Hadoop distribution. Mine is running on VirtualBox, but they have the same thing for VMware.

  1. First, you’ll use standard Hive to generate EXPLAIN output in JSON format.
  2. Then you’ll use a Python script called lipwig.py that is most courteously supplied to us by the t3rmin4t0r. His program reads the JSON and emits something called a dot file.
  3. The dot file is a generic, humanly readable and portable description of a graphical object. Such a file is the input to a program called dot, which knows how to output graphics for the described image in a form that any computer can understand. Read more about dot here.
  4. In this case, dot will output a file in SVG format. SVG stands for Scalable Vector Graphics, and most machines will know what to do with it.

I describe below the sequence of steps used to generate the graphical representation of EXPLAIN’s output shown above. Before you try it out on your own code, you’ll need to get lipwig.py from github.com.  I assume you know what git is, but if you don’t, check this out first.

Use cd to go to a convenient workspace directory and pull down the lipwig project using the command below. This will “clone,” i.e., get a complete copy of the source code tree for a program and put it in its own directory within the directory you are working in. (In this case it’s just one file plus a readme.txt file.)

git clone https://github.com/t3rmin4t0r/lipwig.git

You’ll also need the dot program, which is part of the graphviz package from AT&T.  You can install this on Linux by running the following.

sudo yum install graphviz

Now you should be ready to rock and roll.

First tack the keywords “EXPLAIN FORMATTED” in front of the query you want to analyze. My source is in a file called myprog.sql and it looks like this:

     br.browser, ct.connection_type, co.country, date_time, evar3, evar28, evar29, first_hit_time_gmt, geo_dma,
     javascript, last_hit_time_gmt, post_purchaseid, last_purchase_time_gmt, new_visit, post_browser_height,
     post_browser_width, post_cookies, post_currency, post_cust_hit_time_gmt, post_evar3, post_evar28,
     post_evar29, post_event_list, post_persistent_cookie, post_product_list, post_visid_high, post_visid_low,
     post_zip, prev_page, visit_num, visit_page_num, visit_start_time_gmt
     hit_data hd join browser br join connection_type ct join country co
     hd.browser=br.id and hd.connection_type=ct.id and hd.country=co.id
     hd.post_product_list like "%hash::448163969%"
     hd.post_product_list like "%hash::1256898673%"
      hd.post_product_list not like "%hash::125680000%" 

Run it as follows to put the output of EXPLAIN FORMATTED into a file.  The ‘>’ sign puts the output into a file instead of dumping it on the screen.

hive -f myprog.sql > explain.json

The contents of explain.json is the raw JSON-formatted EXPLAIN output. The JSON is human-readable, but only for specially gifted humans, unless you have set up your browser to display it.  If you’re using Chrome, try this page for a plugin. Something similar is available for almost any browser.

Use lipwig.py to create a dot file from the JSON.  If you’re running Hive, then you already have Python, so all you need to do is tell it where lipwig.py is. For me, the installation directory for lipwig.py is one directory up, hence the double dots, but yours is wherever you installed it. The lipwig.py program dumps the graphical specificiation into a file called explain.dot.

python ../lipwig/lipwig.py explain.json > explain.dot

Now, all you need to do is create the .svg file. My dot file was called explain.dot, so I tell dot to create a type .svg file called explain.svg, using explain.dot as input.

dot -Tsvg -o explain.svg  explain.dot

That’s it.  If you double click on explain.svg you’ll see your graphical output.  Otherwise, you could use something like GhostScript (which a freeware PostScript utility) to display the output directly out of dot as follows.

dot -Tsvg explain.dot | gs –

SVG files work fine for most purposes, but as it happens, the WordPress editor doesn’t support that format. If you also have some other reason to want an alternate format, -Tjpg and  -Tpng, etc. will give you alternate formats.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s