Thursday, September 25, 2014

Hadoop Administration Part 7 : Hadoop Performance Tuning

Performance Tuning:
By default, the MapR cluster is tuned to perform well for most workloads. However, in certain circumstances, you might want to manually tune the MapR Filesystem to provide higher performance.

This section includes the following topics:
1) Configure NFS Write Performance
2) Chunk Size
3) Increase Caching

Configure NFS Write Performance:
The kernel tunable value sunrpc.tcp_slot_table_entries represents the number of simultaneous Remote Procedure Call (RPC) requests. This tunable's default value is 16. Increasing this value to 128 may improve write speeds. Use the command sysctl -w sunrpc.tcp_slot_table_entries=128 to set the value. Add an entry to your sysctl.conf file to make the setting persist across reboots.NFS write performance varies between different Linux distributions. This suggested change may have no or negative effect on your particular cluster.

Chunk Size:
Chunk size affects parallel processing and random disk I/O during MapReduce jobs. A higher chunk size means less parallel processing because there are fewer map inputs, and therefore fewer mappers. A lower chunk size improves parallelism, but results in higher random disk I/O during shuffle because there are more map outputs. Set the io.sort.mb parameter to a value between 120% and 150% of the chunk size.
Here are general guidelines for chunk size:
For most purposes, set the chunk size to the default 256 MB and set the value of the io.sort.mb parameter to the default 380 MB.
On very small clusters or nodes with not much RAM, set the chunk size to 128 mb and set the value of the io.sort.mb parameter to 190 MB.

If application-level compression is in use, the i.o.sort.mb parameter should be at least 380MB.

Setting Chunk Size:
You can set the chunk size for a given directory in two ways:
Change the ChunkSize attribute in the .dfs_attributes file at the top level of the directory
Use the command hadoop mfs -setchunksize <size> <directory>

For example, if the volume test is NFS-mounted at /mapr/my.cluster.com/projects/test you can set the chunk size to 268,435,456 bytes by editing the file /mapr/my.cluster.com/projects/test/.dfs_attributes and setting ChunkSize=268435456. To accomplish the same thing from the hadoop shell, 
use the following command:
hadoop mfs -setchunksize 268435456 /mapr/my.cluster.com/projects/test

Increase Caching:
If you can give more memory to MapR-FS, performance improves due to greater data caching. If your main constraint is disk I/O, this is especially important. For the parameters that you can configure to give warden more memory, see Memory Allocation for Nodes.

Memory for the MapR-FS
By default, Warden allocates 35% of available memory to MapR-FS to allow adequate memory for MapR-DB. If you do not intend to use MapR-DB, you can set the -noDB option in configure.sh to specify that 20% of the memory available should be allocated to MapR-FS.

No comments:

Post a Comment