Configure HDFS ACLs

Every file/folder in linux is owned by a owner and the group. If an user needs to access the file (read, write, modify) either the user has to be part of the group or the file has appropriate “others” permissions. In this model, we can’t set different permissions userwise, groupwise catering to our requirements.

ACLs control the access of HDFS files by providing a way to set different permissions for specific named users or named groups.

They enhance the traditional permissions model by allowing users to define access control for various combination of users and groups instead of a single owner/user or a single group.

 

Enabling HDFS ACLs Using Cloudera Manager

  1. Go to the CM – HDFS service.
  2. Click the Configuration tab.
  3. Select Scope > Service_name (Service-Wide)
  4. Locate the Enable Access Control Lists property and select its checkbox to enable HDFS ACLs.
  5. Click Save Changes to commit the changes.

Without enabling HDFS ACLS, we can’t perform ACL operations in HDFS.

 

Enabling HDFS ACLs Using the Command Line

To enable ACLs using the command line, set the dfs.namenode.acls.enabled property to true in the NameNode’s hdfs-site.xml.

<property>
<name>dfs.namenode.acls.enabled</name>
<value>true</value>
</property>

Commands

To set and get file access control lists (ACLs), use the file system shell commands, setfacl and getfacl.

getfacl

hdfs dfs -getfacl [-R] <path>

<!-- COMMAND OPTIONS
<path>: Path to the file or directory for which ACLs should be listed.
-R: Use this option to recursively list ACLs for all files and directories.
-->

Examples:

<!-- To list all ACLs for the file located at /user/kannan -->
hdfs dfs -getfacl /user/kannan

<!-- To recursively list ACLs for /user/hdfs/file
hdfs dfs -getfacl -R /user/kannan

Note: We can set different ACLs for a directory, sub directory, files inside the directories.

setfacl

hdfs dfs -setfacl [-R] [-b|-k -m|-x <acl_spec> <path>]|[--set <acl_spec> <path>]

<!-- COMMAND OPTIONS
<path>: Path to the file or directory for which ACLs should be set.
-R: Use this option to recursively list ACLs for all files and directories.
-b: Revoke all permissions except the base ACLs for user, groups and others.
-k: Remove the default ACL.
-m: Add new permissions to the ACL with this option. Does not affect existing permissions.
-x: Remove only the ACL specified.
<acl_spec>: Comma-separated list of ACL permissions.
--set: Use this option to completely replace the existing ACL for the path specified. 
       Previous ACL entries will no longer apply.
-->

Examples:

### To give user stonecold read, write permission over /user/cold/file ###
hdfs dfs -setfacl -m user:stonecold:rw- /user/cold/file

### To remove user undertaker ACL entry for /user/taker/file ###
hdfs dfs -setfacl -x user:underataker /user/taker/file

 

Set up a local CDH repository

 

This post will explain you how to set up a local YUM/CDH repository for your network.

In Linux, /etc/yum.repos.d is the path for yum repos present in the server. For every repo , there will be a baseurl value which contains the link for the repository path.

When you execute “yum install packagename” the yum will look go through each repos and contact baseurl via internet for the availability of packagename you’ve given. If there’s no internet connectivity, baseurl can’t be reached and the command will fail. In organizations, it’s prohibited to download packages from external sites/repositories, so they’ll create a repo satellite and put all the necessary packages/rpms in the satellite, from there we can download the packages.

In this task, we are going to download the CDH repos to our server and create a local repository in the server, so that the other servers in our network can contact this local repo instead of cloudera for installing CDH packages.

You need internet connection to download the packages for the first time to set up the repository.

 

Step 1: Download the repo to your machine

RHEL / Cent OS 6 :

# wget https://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/cloudera-cdh5.repo

RHEL / Cent OS 7:

# wget https://archive.cloudera.com/cdh5/redhat/7/x86_64/cdh/cloudera-cdh5.repo

After wget, move the cloudera-cdh5.repo to /etc/yum.repos.d for understanding.

 

Step 2: Install webserver

We need webserver to be installed in this server, so that others can access the rpms through http.

# yum install httpd -y

This will create a /var/www/html directory. Whatever files you place under this directory can be accessed via http.

# service httpd start

 

Step 3: Install yum-utils and createrepo

The yum-utils package includes the reposync command, which is required to create the local Yum repository and createrepo will create a repo file.

# yum install yum-utils createrepo -y

 

Step 4: Fetch the rpms of CDH5 repo to your server

# reposync -r cloudera-cdh5

This command will download all the available rpms in cloudera-cdh5 repo (wget’d in step 1) to your server.

Copy the RPMs inside the downloaded directory to /var/www/html/cdh/5/rpms/ folder.

Now you should be able to access the rpms in browser via url “http://servername/cdh/5/rpms&#8221;.

 

Step 5: Create a repo file

Inside /var/www/html/cdh/5/ folder, run the below command.

# createrepo .

This creates or update the metadata required by the yum command to recognize the directory as a repository. The command creates a new directory called repodata.

Edit the repo file you downloaded in step 1 and replace the line starting with baseurl as baseurl=http://servername/cdh/5/, using the URL from step 4. Save the file back to /etc/yum.repos.d/.

 

Step 6: Local CDH repository created

Distribute the /etc/yum.repos.d/cloudera-cdh5 to all of your servers. Now they can download the rpms from this machine without a need of connecting to the internet.