cloudera Archives - Hadoop and Cloud

August 29, 2017

Hadoop

7 Comments

Install and configure Sentry

Before adding Sentry, below are the general prerequisites need to be done. This may be mentioned in the problem description. Please confirm the hive warehouse directory detail in /etc/hive/conf/hive-site.xml file. The Hive warehouse directory (/user/hive/warehouse) must be owned by the Hive user and group and should have 771 permissions. # sudo –u hdfs hadoop fs […]

Kannan AK

August 28, 2017

Hadoop

2 Comments

Add a service using Cloudera Manager

Your running cluster will be having only core services (HDFS, YARN, Zookeeper) or handful of services and your task is to add a specific service to the cluster. To add a service: Go to CM – click the drop down box near the cluster – select Add service. You will get a list of services […]

Kannan AK

August 25, 2017

Hadoop

1 Comment

Enable/configure log and query redaction

Data redaction is the suppression of sensitive data, such as any personally identifiable information (PII) such as credit card number, email address, social security number. Cloudera has a data redaction feature, which will mask the credit card, email address with random or custom strings(we specify), so that in queries, log files those random strings will […]

Kannan AK

August 23, 2017

Hadoop

1 Comment

Efficiently copy data within a cluster/between clusters

DistCp (distributed copy) is a tool used for large inter/intra-cluster copying of HDFS data. It uses MapReduce to effect its distribution, error handling and recovery, and reporting. It expands a list of files and directories into input to map tasks, each of which will copy a partition of the files specified in the source list. […]

Kannan AK

August 23, 2017

Hadoop

1 Comment

Perform OS-level configuration for Hadoop installation

Before installing CDH in our server, we’ve to make the below configuration changes in OS level for successful installation. Disable SELINUX “Security-Enhanced Linux (SELinux) is a Linux kernel security module that provides a mechanism for supporting access control security policies” If SElinux is enabled, then cloudera server installation will fail in the server. To disable […]

Kannan AK

August 19, 2017

Hadoop

1 Comment

Configure HDFS ACLs

Every file/folder in linux is owned by a owner and the group. If an user needs to access the file (read, write, modify) either the user has to be part of the group or the file has appropriate “others” permissions. In this model, we can’t set different permissions userwise, groupwise catering to our requirements. ACLs […]

Kannan AK

August 12, 2017

Hadoop

3 Comments

Set up a local CDH repository

This post will explain you how to set up a local YUM/CDH repository for your network. In Linux, /etc/yum.repos.d is the path for yum repos present in the server. For every repo , there will be a baseurl value which contains the link for the repository path. When you execute “yum install packagename” the […]

Kannan AK

August 12, 2017

Hadoop

1 Comment

CCA131 – Cloudera Administration Certification Exam Notes and Preparation Guide

In this post, we’ll go through the exam blueprint topics and show you what and how to perform the tasks in each topic. All the embedded links are my exam preparation notes. Go through each link and perform/practice it in your cluster. Please practice till the time you’re confident of doing all the tasks without referring […]