what is Hadoop HDFS Balancer.
The HDFS balancer re-balances data across the DataNodes, moving blocks from over-utilized to under-utilized nodes.
i.e balance the load between datanodes
why do we need Hadoop HDFS Balancer.
HDFS data might not always be be placed uniformly across the DataNode. One common reason is addition of new DataNodes to an existing cluster. While placing new blocks (data for a file is stored as a series of blocks), NameNode considers various parameters before choosing the DataNodes to receive these blocks. Some of the considerations are:
- Policy to keep one of the replicas of a block on the same node as the node that is writing the block.
- Need to spread different replicas of a block across the racks so that cluster can survive loss of whole rack.
- One of the replicas is usually placed on the same rack as the node writing to the file so that cross-rack network I/O is reduced.
- Spread HDFS data uniformly across the DataNodes in the cluster.HDFS blocks allocation strategy tries hard to spread new blocks evenly amongst all the datanodes.
The rational behind that behavior is to avoid recently added nodes into the cluster to become a bottleneck because all the new blocks would be allocated and read from that datanode.
To overcome this type of issues we use Hadoop HDFS Balancer
So Hadoop HDFS Balancer need to be run on a regular basis.
HDFS Balancer
Help entry from the command line:
1 2 3 4 |
$ hdfs balancer -h Usage: java Balancer [-policy ] the balancing policy: datanode or blockpool (default datanode) [-threshold ] Percentage of disk capacity (default 10) |
The threshold parameter is a float number between 0 and 100 (12.5 for instance). From the average cluster utilization (about 50% in the graph below), the balancer process will try to converge all datanodes’ usage in the range [average – threshold, average + threshold]. In the current example:
– Higher (average + threshold): 60% if run with the default threshold (10%)
– Lower (average – threshold): 40%
You can easily notice that the smaller your threshold, the more balanced your datanodes will be. For very small threshold, the cluster may not be able to reach the balanced state if other clients concurrently write and delete data in the cluster.
The balancer will pick datanodes with current usage above the higher threshold (the source, classified as over-utilized), and try to find blocks from these datanodes that could be copied into nodes with current usage below the lower threshold (the destination, under-utilized). A second selection round will select over-utilized nodes to move blocks to nodes with utilization below average. A third pass will pick nodes with utilization above average to move data to under-utilized nodes.
Example: We have 3 nodes
datanode 1 has 50%
datanode 2 has 30%
datanode 3 has 20%
Average=50+30+20/3= 33.3%
Higher (33 + 10)=43%
Lower (33 – 10)=23%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
[hadoop@ip- ~]$ hdfs dfsadmin -report Configured Capacity: 5402133774336 (4.91 TB) Present Capacity: 5369892622174 (4.88 TB) DFS Remaining: 4016619384832 (3.65 TB) DFS Used: 1353273237342 (1.23 TB) DFS Used%: 33.20% Under replicated blocks: 121 Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Datanodes available: 3 (3 total, 0 dead) Live datanodes: Name: Decommission Status : Normal Configured Capacity: 1800711258112 (1.64 TB) DFS Used: 498341764558 (464.12 GB) Non DFS Used: 11374621234 (10.59 GB) DFS Remaining: 1290994872320 (1.17 TB) DFS Used%: 50.67% DFS Remaining%: 49.33% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Last contact: Wed Jan 06 12:03:30 UTC 2016 Decommission Status : Normal Configured Capacity: 1800711258112 (1.64 TB) DFS Used: 508083595621 (473.19 GB) Non DFS Used: 10511382171 (9.79 GB) DFS Remaining: 1282116280320 (1.17 TB) DFS Used%: 30.22% DFS Remaining%: 69.78% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Last contact: Wed Jan 06 12:03:31 UTC 2016 Name: Hostname: Decommission Status : Normal Configured Capacity: 1800711258112 (1.64 TB) DFS Used: 346847877163 (323.03 GB) Non DFS Used: 10355148757 (9.64 GB) DFS Remaining: 1443508232192 (1.31 TB) DFS Used%: 20.26% DFS Remaining%: 79.76% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Last contact: Wed Jan 06 12:03:30 UTC 2016 |
Iterative steps in the balancer
The balancer process is iterative. As HDFS has lots of moving state, the balancer try at each iteration to move 10 GB * number of selected sources. Each iteration does not last more than 20 minutes.
The progress of the runs is reported to stdout:
$ hdfs balancer -threshold 25
Time Stamp Iteration# Bytes Already Moved Left To Move Bytes Being Moved
Jan 01, 2016 9:29:16 AM 0 0 KB 5.82 TB 50 GB
Jan 01, 2016 9:46:40 AM 1 49.88 GB 5.76 TB 50 GB
Jan 02, 2016 10:04:27 AM 2 99.67 GB 5.72 TB 50 GB
Jan 02, 2016 10:22:36 AM 3 149.62 GB 5.69 TB 50 GB
…
Jan 03, 2016 3:24:54 AM 11 2.07 TB 4.08 TB 30 GB
Jan 04, 2016 3:42:32 AM 12 2.1 TB 4.05 TB 30 GB
Jan 05, 2016 4:00:19 AM 13 2.13 TB 4.02 TB 30 GB
Jan 06, 2016 4:18:15 AM 14 2.16 TB 3.95 TB 30 GB