Uncategorized

Big Jobs Little Jobs

You’ve probably heard the well-known Hadoop paradox that even on the biggest clusters, most jobs are small, and the monster jobs that Hadoop is designed for are actually the exception.

Sumatran-elephant

This is true, but it’s not the whole story. It isn’t easy to find detailed numbers on how clusters are used in the wild, but I recently came across some decent data on a 2011 production analytics cluster at Microsoft. Technology years are like dog years but the processing load it describes remains representative of the general state of things today, and back-of-the-envelope analysis of the data presented in the article yields some interesting insights.

Continue reading

Standard