hadoopoopadoop

Big Data with Hortonworks Hadoop

Category Archives: Hadoop hardware

Hadoop, Hadoop hardware, Uncategorized, YARN

Your Cluster Is An Appliance

February 26, 2016Peter CoatesHadoop, hardware, Yarn Leave a comment

fridge Hadoop and Ambari usually run over Linux, but please don’t fall into thinking of your cluster as a collection of Linux boxes; for stability and efficiency, you need to treat it like an appliance dedicated to Hadoop. Here’s why.

Continue reading →

Standard

Hadoop, Hadoop hardware, Uncategorized

A Question of Balance

January 6, 2016Peter Coatesdisk, HDFS Leave a comment

When you add nodes to a cluster, they start out empty. They work, but the data for them to work on isn’t co-located, so it’s not very efficient. Therefore, you want to tell HDFS to rebalance.

After adding new racks to our 70 node cluster, we noticed that it was taking several hours per terabyte to rebalance the nodes. You can copy a terabyte of data across a 10GbE network in under half an hour with SCP, so why should HDFS take several hours?

Continue reading →

Standard

Hadoop hardware

Understanding Hadoop Hardware Requirements

September 22, 2015Peter Coatesdisk, Hadoop hardware requirements, Yarn Leave a comment

I want my big-data applications to run as fast as possible. So why do the engineers who designed Hadoop specify “commodity hardware” for Hadoop clusters? Why go out of your way to tell people to run on mediocre machines?

Showroom+deco+Hardware

Continue reading →

Standard

Blog at WordPress.com.

Subscribe Subscribed
- hadoopoopadoop
- Already have a WordPress.com account? Log in now.

	Water on A Pilgrim’s Progress #1:…
	Stewyn on Shifting to Hive Part II: Best…
	Glen on Go Go Go
	hadoop 3 Erasure cod… on Erasure Code in Hadoop
	Rajesh KSV on Shifting to Hive Part II: Best…

	Water on A Pilgrim’s Progress #1:…
	Stewyn on Shifting to Hive Part II: Best…
	Glen on Go Go Go
	hadoop 3 Erasure cod… on Erasure Code in Hadoop
	Rajesh KSV on Shifting to Hive Part II: Best…