Big Data/Hadoop: Hadoop Administration Part 5 : Snapshot and Schedules

Thursday, September 25, 2014

Hadoop Administration Part 5 : Snapshot and Schedules

Snapshot:

Snapshots are a useful feature for backup / restore, data import / export from environments etc

A snapshot is a read-only image of a volume at a specific point in time. On clusters with an M5 or higher license, you can create a snapshot manually or automate the process with a schedule. Snapshots are useful any time you need to be able to roll back to a known good data set at a specific point in time.

For example, before performing a risky operation on a volume, you can create a snapshot to enable rollback capability for the entire volume. A snapshot takes no time to create, and initially uses no disk space, because it stores only the incremental changes needed to roll the volume back to the state at the time the snapshot was created. The storage used by a volume's snapshots does not count against the volume's quota. When you view the list of volumes on your cluster in the MapR Control System, the value of the Snap Size column is the disk space used by all of the snapshots for that volume.

Creating a Volume Snapshot

You can create a snapshot manually or use a schedule to automate snapshot creation. Each snapshot has an expiration date that determines how long the snapshot will be retained:

1) When you create the snapshot manually, specify an expiration date.

2) When you schedule snapshots, the expiration date is determined by the Retain parameter of the schedule.

For more information about scheduling snapshots, see Scheduling a Snapshot.

Viewing the Contents of a Snapshot

At the top level of each volume is a directory called .snapshot containing all the snapshots for the volume. You can view the directory with hadoop fs commands or by mounting the cluster with NFS. To prevent recursion problems, ls and hadoop fs -ls do not show the .snapshot directory when the top-level volume directory contents are listed. You must navigate explicitly to the .snapshot directory to view and list the snapshots for the volume.

Example:
bash$/opt/mapr/bin# hadoop fs -ls /user/divakar/.snapshot
Found 1 items
drwxrwxrwx - root root 1 2011-06-01 09:57 /user/divakar/.snapshot/2014-04-04.09-57-49

Copy from Snapshot to Volumes

cp -v /user/divakar/.snapshot/2014-04-04.09-57-49 /user/data/

Selecting Snapshot while Creating Volumes/Mirrors

Schedules:
A schedule is a group of rules that specify recurring points in time at which certain actions are determined to occur. You can use schedules to automate the creation of snapshots and mirrors; after you create a schedule, it appears as a choice in the scheduling menu when you are editing the properties of a task that can be scheduled:

1) To apply a schedule to snapshots, see Scheduling a Snapshot.

2) To apply a schedule to volume mirroring, see Creating Volumes..

Prod Data:

User Data:

Big Data/Hadoop

Thursday, September 25, 2014

Hadoop Administration Part 5 : Snapshot and Schedules

No comments:

Post a Comment

Search This Blog

Blog Archive

Total Pageviews

Translate