Hadoop, YARN

The YARN Revolution

YARN—the data operating system for Hadoop.  Bored yet? They should call it YAWN, right?

152bb5c293e1e1b091141c2c1ad9ebda2

Not really—YARN is turning out be the biggest thing to hit big-data since Hadoop itself, despite the fact that it runs down in the plumbing of somewhere, and even some Hadoop users aren’t 100% clear on exactly what it does. In some ways, the technical improvements it enables aren’t even the most important part. YARN is changing the very economics of Hadoop.

Continue reading

Standard
Hadoop Hive

Shifting to Hive Part II: Best Practices and Optimizations

This is part two of an extended article. See part one here.

beehive

A full listing of Hive best practices and optimization would fill a book. All we’ll do here is skim over the topics that best indicate the spirit of Hive, and how it is used most successfully. There’s plenty of detail available in the documentation and on the Web at large.  Hopefully, these quick run-downs will provide enough background and keywords for a rewarding Google search.

Continue reading

Standard
Hadoop Hive

Shifting to Hive Part I: Origins

SQL is the lingua-franca of data big and small, but SQL is a language, not a platform—it serves as the conceptual framework for data tasks on many platforms, ranging from blog content management with MySQL, to high-frequency online transaction processing (OLTP) systems, to heavy-duty batch processing on Hadoop and other big-data platforms.

BeehiveWoodcut

I hope this page will help people who are experienced with conventional RDBMS’s and OLTP systems make the jump to working with big data using Apache Hive, the most important of the SQL big-data platforms.

Continue reading

Standard