Performance Tuning:
By
default, the MapR cluster is tuned to perform well for most workloads. However,
in certain circumstances, you might want to manually tune the MapR Filesystem
to provide higher performance.
This
section includes the following topics:
1) Configure
NFS Write Performance
2) Chunk Size
3) Increase Caching
Configure NFS Write Performance:
The kernel tunable value sunrpc.tcp_slot_table_entries represents the number of simultaneous Remote Procedure Call (RPC) requests. This tunable's default value is 16. Increasing this value to 128 may improve write speeds. Use the command sysctl -w sunrpc.tcp_slot_table_entries=128 to set the value. Add an entry to your sysctl.conf file to make the setting persist across reboots.NFS write performance varies between different Linux distributions. This suggested change may have no or negative effect on your particular cluster.
Chunk Size:
2) Chunk Size
3) Increase Caching
Configure NFS Write Performance:
The kernel tunable value sunrpc.tcp_slot_table_entries represents the number of simultaneous Remote Procedure Call (RPC) requests. This tunable's default value is 16. Increasing this value to 128 may improve write speeds. Use the command sysctl -w sunrpc.tcp_slot_table_entries=128 to set the value. Add an entry to your sysctl.conf file to make the setting persist across reboots.NFS write performance varies between different Linux distributions. This suggested change may have no or negative effect on your particular cluster.
Chunk Size:
Chunk
size affects parallel processing and random disk I/O during MapReduce jobs. A
higher chunk size means less parallel processing because there are fewer map
inputs, and therefore fewer mappers. A lower chunk size improves parallelism,
but results in higher random disk I/O during shuffle because there are more map
outputs. Set the io.sort.mb parameter to a value between 120% and 150% of the
chunk size.
Here
are general guidelines for chunk size:
For
most purposes, set the chunk size to the default 256 MB and set the value of
the io.sort.mb parameter to the default 380 MB.
On
very small clusters or nodes with not much RAM, set the chunk size to 128 mb
and set the value of the io.sort.mb parameter to 190 MB.
If
application-level compression is in use, the i.o.sort.mb parameter should be at
least 380MB.
Setting
Chunk Size:
You
can set the chunk size for a given directory in two ways:
Change
the ChunkSize attribute in the .dfs_attributes file at the top level of the
directory
Use
the command hadoop mfs -setchunksize <size> <directory>
For
example, if the volume test is NFS-mounted at /mapr/my.cluster.com/projects/test
you can set the chunk size to 268,435,456 bytes by editing the file
/mapr/my.cluster.com/projects/test/.dfs_attributes and setting
ChunkSize=268435456. To accomplish the same thing from the hadoop shell,
use
the following command:
hadoop
mfs -setchunksize 268435456 /mapr/my.cluster.com/projects/test
Increase Caching:
Increase Caching:
If you can give more
memory to MapR-FS, performance improves due to greater data caching. If your
main constraint is disk I/O, this is especially important. For the parameters
that you can configure to give warden more memory, see Memory Allocation for
Nodes.
Memory for the MapR-FS
By default, Warden
allocates 35% of available memory to MapR-FS to allow adequate memory for
MapR-DB. If you do not intend to use MapR-DB, you can set the -noDB option in
configure.sh to specify that 20% of the memory available should be allocated to
MapR-FS.
No comments:
Post a Comment