Create/restore a snapshot of an HDFS directory

HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a directory of the file system or the entire file system.

To enable a snapshot on a specific directory,

Go to CM – HDFS – File Browser

Select the directory in the file browser, select ‘Enable Snapshots’ in the right hand side panel.

Provide the directory path inside which you want to store the snapshots of this directory and select Enable Snapshots.

image20

 

image23

 

Once the snapshots are enabled, if you select the drop down menu near the folder name, you’ll get options as “Disable Snapshots” and Take Snapshot”.

Use take snapshot to take the current snapshot of files/directories.

Once the snapshot is taken, you can see it is listed under the folder details.

To restore a snapshot, click drop down button near folder name again and select restore snapshot.

You can use ‘HDFS copy command’ if the snapshot is smaller in size. For bigger snapshots, use Distcp/Mapreduce by providing the job operation details.

You can also restore the snapshot by using HDFS CLI commands.

Go to the server, then list the path where the snapshots are stored.

hadoop fs –ls /user/kannan/.snapshot/

It will give you the copy of the directory with files present during the first snapshot. Copy them to the desired directory.

Note:

After restoring the snapshot, verify the permissions of the files in the snapshot and restored ones are same. It tend to change sometimes.

 

Problem Scenarios:

  • Enable snapshots for a directory and take one snapshot.
  • Restore the snapshot of the directory taken at specific time.

 

Thus we covered how to Create/restore a snapshot of an HDFS directory

Use the comments section below to post your doubts, questions and feedback.

Please follow my blog to get notified of more certification related posts, exam tips, etc.

 


 

 

Leave a Reply

%d bloggers like this: