Cloudera CCA-500 today updated questions - Verified by Cloudera Experts

Cloudera Certified Administrator for Apache Hadoop (CCAH) Questions and Answers

Question 1

You have recently converted your Hadoop cluster from a MapReduce 1 (MRv1) architecture to MapReduce 2 (MRv2) on YARN architecture. Your developers are accustomed to specifying map and reduce tasks (resource allocation) tasks when they run jobs: A developer wants to know how specify to reduce tasks when a specific job runs. Which method should you tell that developers to implement?

Options:

MapReduce version 2 (MRv2) on YARN abstracts resource allocation away from the idea of “tasks” into memory and virtual cores, thus eliminating the need for a developer to specify the number of reduce tasks, and indeed preventing the developer from specifying the number of reduce tasks.

In YARN, resource allocations is a function of megabytes of memory in multiples of 1024mb. Thus, they should specify the amount of memory resource they need by executing –D mapreduce-reduces.memory-mb-2048

In YARN, the ApplicationMaster is responsible for requesting the resource required for a specific launch. Thus, executing –D yarn.applicationmaster.reduce.tasks=2 will specify that the ApplicationMaster launch two task contains on the worker nodes.

Developers specify reduce tasks in the exact same way for both MapReduce version 1 (MRv1) and MapReduce version 2 (MRv2) on YARN. Thus, executing –D mapreduce.job.reduces-2 will specify reduce tasks.

In YARN, resource allocation is function of virtual cores specified by the ApplicationManager making requests to the NodeManager where a reduce task is handeled by a single container (and thus a single virtual core). Thus, the developer needs to specify the number of virtual cores to the NodeManager by executing –p yarn.nodemanager.cpu-vcores=2

Question 2

You’re upgrading a Hadoop cluster from HDFS and MapReduce version 1 (MRv1) to one running HDFS and MapReduce version 2 (MRv2) on YARN. You want to set and enforce version 1 (MRv1) to one running HDFS and MapReduce version 2 (MRv2) on YARN. You want to set and enforce a block size of 128MB for all new files written to the cluster after upgrade. What should you do?

Options:

You cannot enforce this, since client code can always override this value

Set dfs.block.size to 128M on all the worker nodes, on all client machines, and on the NameNode, and set the parameter to final

Set dfs.block.size to 128 M on all the worker nodes and client machines, and set the parameter to final. You do not need to set this value on the NameNode

Set dfs.block.size to 134217728 on all the worker nodes, on all client machines, and on the NameNode, and set the parameter to final

Set dfs.block.size to 134217728 on all the worker nodes and client machines, and set the parameter to final. You do not need to set this value on the NameNode

Question 3

Assuming you’re not running HDFS Federation, what is the maximum number of NameNode daemons you should run on your cluster in order to avoid a “split-brain” scenario with your NameNode when running HDFS High Availability (HA) using Quorum-based storage?

Options:

Two active NameNodes and two Standby NameNodes

One active NameNode and one Standby NameNode

Two active NameNodes and on Standby NameNode

Unlimited. HDFS High Availability (HA) is designed to overcome limitations on the number of NameNodes you can deploy

Question 4

You have just run a MapReduce job to filter user messages to only those of a selected geographical region. The output for this job is in a directory named westUsers, located just below your home directory in HDFS. Which command gathers these into a single file on your local file system?

Options:

Hadoop fs –getmerge –R westUsers.txt

Hadoop fs –getemerge westUsers westUsers.txt

Hadoop fs –cp westUsers/* westUsers.txt

Hadoop fs –get westUsers westUsers.txt

Question 5

You are running a Hadoop cluster with MapReduce version 2 (MRv2) on YARN. You consistently see that MapReduce map tasks on your cluster are running slowly because of excessive garbage collection of JVM, how do you increase JVM heap size property to 3GB to optimize performance?

Options:

yarn.application.child.java.opts=-Xsx3072m

yarn.application.child.java.opts=-Xmx3072m

mapreduce.map.java.opts=-Xms3072m

mapreduce.map.java.opts=-Xmx3072m

Question 6

You observed that the number of spilled records from Map tasks far exceeds the number of map output records. Your child heap size is 1GB and your io.sort.mb value is set to 1000MB. How would you tune your io.sort.mb value to achieve maximum memory to disk I/O ratio?

Options:

For a 1GB child heap size an io.sort.mb of 128 MB will always maximize memory to disk I/O

Increase the io.sort.mb to 1GB

Decrease the io.sort.mb value to 0

Tune the io.sort.mb value until you observe that the number of spilled records equals (or is as close to equals) the number of map output records.

Question 7

You use the hadoop fs –put command to add a file “sales.txt” to HDFS. This file is small enough that it fits into a single block, which is replicated to three nodes in your cluster (with a replication factor of 3). One of the nodes holding this file (a single block) fails. How will the cluster handle the replication of file in this situation?

Options:

The file will remain under-replicated until the administrator brings that node back online

The cluster will re-replicate the file the next time the system administrator reboots the NameNode daemon (as long as the file’s replication factor doesn’t fall below)

This will be immediately re-replicated and all other HDFS operations on the cluster will halt until the cluster’s replication values are resorted

The file will be re-replicated automatically after the NameNode determines it is under-replicated based on the block reports it receives from the NameNodes

Question 8

On a cluster running CDH 5.0 or above, you use the hadoop fs –put command to write a 300MB file into a previously empty directory using an HDFS block size of 64 MB. Just after this command has finished writing 200 MB of this file, what would another use see when they look in directory?

Options:

The directory will appear to be empty until the entire file write is completed on the cluster

They will see the file with a ._COPYING_ extension on its name. If they view the file, they will see contents of the file up to the last completed block (as each 64MB block is written, that block becomes available)

They will see the file with a ._COPYING_ extension on its name. If they attempt to view the file, they will get a ConcurrentFileAccessException until the entire file write is completed on the cluster

They will see the file with its original name. If they attempt to view the file, they will get a ConcurrentFileAccessException until the entire file write is completed on the cluster

Question 9

You want to understand more about how users browse your public website. For example, you want to know which pages they visit prior to placing an order. You have a server farm of 200 web servers hosting your website. Which is the most efficient process to gather these web server across logs into your Hadoop cluster analysis?

Options:

Sample the web server logs web servers and copy them into HDFS using curl

Ingest the server web logs into HDFS using Flume

Channel these clickstreams into Hadoop using Hadoop Streaming

Import all user clicks from your OLTP databases into Hadoop using Sqoop

Write a MapReeeduce job with the web servers for mappers and the Hadoop cluster nodes for reducers

Load More CCA-500 Questions

Big Halloween Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70percent

Cloudera CCA-500 Cloudera Certified Administrator for Apache Hadoop (CCAH) Exam Practice Test

Cloudera Certified Administrator for Apache Hadoop (CCAH) Questions and Answers

Options:

Answer:

Options:

Answer:

Options:

Answer:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation: