Thursday, September 25, 2014

Hadoop Administration Part 6 : Topology & Data Protection

Topology:
While creating a volume we can set the Topology to specify which rack or nodes the volume will occupy.Topology scripts are used by hadoop to determine the rack location of nodes. This information is used by hadoop to replicate block data to redundant racks.

Setting Up Node Topology

Topology paths can be as simple or complex as needed to correspond to your cluster layout. In a simple cluster, each topology path might consist of the rack only (for example, /rack-1). In a deployment consisting of multiple large datacenters, each topology path can be much longer (for example, /europe/uk/london/datacenter2/room4/row22/rack5/).Establish a /data topology path to serve as the default topology path for the volumes in that cluster.Establish a  /decommissioned topology path that is not assigned to any volumes.

When you need to migrate a data volume off a particular node, move that node from the /data path to the /decommissioned path. Since no data volumes are assigned to that topology path, standard data replication will migrate the data off that node to other nodes that are still in the /data topology path.
For large clusters, you can specify complex topologies in a text file or by using a script. Each line in the text file or script output specifies a single node and the full topology path for that node in the following format:

The text file or script must be specified and available on the local filesystem on all CLDB nodes:

To set topology with a text file, set net.topology.table.file.name in /opt/mapr/conf/cldb.conf to the text file name.If you specify a script and a text file, the MapR system uses the topology specified by the script.

Setting up Topology in MapR:












Data Protection:
You can use MapR to protect your data from hardware failures, accidental overwrites, and natural disasters. MapR organizes data into volumes so that you can apply different data protection strategies to different types of data. The following scenarios describe a few common problems and how easily and effectively MapR protects your data from loss.
This page contains the following topics:

Scenario 1) : Hardware Failure
Solution: Topology and Replication Factor

Scenario 2) : Accidental Overwrite
Solution: Snapshots

Scenario 3) : Disaster Recovery
Solution: Mirroring to Another Cluster

No comments:

Post a Comment