Ok, this is the coolest thing this Hive user has seen all day.
As you probably know, if you prepend the word EXPLAIN to your SQL query and then run it, Hive prints out a text description of the query plan. This lets you explore the effects such variations as code changes, the use of analyze, turning on/off the cost-based optimizer (CBO), and so on. It’s an essential tool for optimizing Hive.
The output of EXPLAIN is far from pretty, but fortunately, a simple pipeline of Linux commands can give you a slick graphical rendition like the one below.
By the way, I’m doing this on the Hortonworks Sandbox for HDP 2.3, but you can do exactly the same thing for any Linux installation of Hive/Hadoop. The Sandbox is a free VM image of a single-node installation the Hortonworks Hadoop distribution. Mine is running on VirtualBox, but they have the same thing for VMware.
- First, you’ll use standard Hive to generate EXPLAIN output in JSON format.
- Then you’ll use a Python script called lipwig.py that is most courteously supplied to us by the t3rmin4t0r. His program reads the JSON and emits something called a dot file.
- The dot file is a generic, humanly readable and portable description of a graphical object. Such a file is the input to a program called dot, which knows how to output graphics for the described image in a form that any computer can understand. Read more about dot here.
- In this case, dot will output a file in SVG format. SVG stands for Scalable Vector Graphics, and most machines will know what to do with it.
I describe below the sequence of steps used to generate the graphical representation of EXPLAIN’s output shown above. Before you try it out on your own code, you’ll need to get lipwig.py from github.com. I assume you know what git is, but if you don’t, check this out first.
Use cd to go to a convenient workspace directory and pull down the lipwig project using the command below. This will “clone,” i.e., get a complete copy of the source code tree for a program and put it in its own directory within the directory you are working in. (In this case it’s just one file plus a readme.txt file.)
git clone https://github.com/t3rmin4t0r/lipwig.git
You’ll also need the dot program, which is part of the graphviz package from AT&T. You can install this on Linux by running the following.
sudo yum install graphviz
Now you should be ready to rock and roll.
First tack the keywords “EXPLAIN FORMATTED” in front of the query you want to analyze. My source is in a file called myprog.sql and it looks like this:
Run it as follows to put the output of EXPLAIN FORMATTED into a file. The ‘>’ sign puts the output into a file instead of dumping it on the screen.
hive -f myprog.sql > explain.json
The contents of explain.json is the raw JSON-formatted EXPLAIN output. The JSON is human-readable, but only for specially gifted humans, unless you have set up your browser to display it. If you’re using Chrome, try this page for a plugin. Something similar is available for almost any browser.
Use lipwig.py to create a dot file from the JSON. If you’re running Hive, then you already have Python, so all you need to do is tell it where lipwig.py is. For me, the installation directory for lipwig.py is one directory up, hence the double dots, but yours is wherever you installed it. The lipwig.py program dumps the graphical specificiation into a file called explain.dot.
python ../lipwig/lipwig.py explain.json > explain.dot
Now, all you need to do is create the .svg file. My dot file was called explain.dot, so I tell dot to create a type .svg file called explain.svg, using explain.dot as input.
dot -Tsvg -o explain.svg explain.dot
That’s it. If you double click on explain.svg you’ll see your graphical output. Otherwise, you could use something like GhostScript (which a freeware PostScript utility) to display the output directly out of dot as follows.
dot -Tsvg explain.dot | gs –
SVG files work fine for most purposes, but as it happens, the WordPress editor doesn’t support that format. If you also have some other reason to want an alternate format, -Tjpg and -Tpng, etc. will give you alternate formats.