Developers might not want to read all the background on Unicode included in this earlier blog entry. Here is a quick distillation of how Unicode and the UTF encodings are relevant to a Hadoop user—just the facts and the warnings.
Ok, this is the coolest thing this Hive user has seen all day.
As you probably know, if you prepend the word EXPLAIN to your SQL query and then run it, Hive prints out a text description of the query plan. This lets you explore the effects such variations as code changes, the use of analyze, turning on/off the cost-based optimizer (CBO), and so on. It’s an essential tool for optimizing Hive.
The output of EXPLAIN is far from pretty, but fortunately, a simple pipeline of Linux commands can give you a slick graphical rendition like the one below.
This is part two of an extended article. See part one here.
A full listing of Hive best practices and optimization would fill a book. All we’ll do here is skim over the topics that best indicate the spirit of Hive, and how it is used most successfully. There’s plenty of detail available in the documentation and on the Web at large. Hopefully, these quick run-downs will provide enough background and keywords for a rewarding Google search.
SQL is the lingua-franca of data big and small, but SQL is a language, not a platform—it serves as the conceptual framework for data tasks on many platforms, ranging from blog content management with MySQL, to high-frequency online transaction processing (OLTP) systems, to heavy-duty batch processing on Hadoop and other big-data platforms.