New Year Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70percent

EMC E20-065 Advanced Analytics Specialist Exam for Data Scientists Exam Practice Test

Demo: 9 questions
Total 66 questions

Advanced Analytics Specialist Exam for Data Scientists Questions and Answers

Question 1

Which scenario is a proper use case for multinomial logistic regression?

Options:

A.

A marketing firm wants to estimate the personal income of a group of potential customers.

Using inputs such as age, education, marital status, and credit card expenditures, a data scientist is building a model that will estimate a person's

income

B.

A logistic distribution company wants to minimize the distance traveled by its delivery trucks.

A data scientist is building a model to determine the optimal route for each of tis trucks

C.

To improve the initial routing of a loan application, a financial institution plans to classify a loan application as Approve, Reject, or Possibly_Approve. Based on the company's historical loan application data, a data scientist is building a model to assign one of these three outcomes to each submitted application.

D.

A manufacturer plans to determine the optimal number of workers to employ in an assembly line process. Utilizing the observed distributions of the task durations of each process step, a data scientist is building a model to mimic the interactions and dependencies between each stage in the manufacturing process.

Question 2

In a social network, what does it mean for a node to have a high degree but low betweenness?

Options:

A.

The node is adjacent to a few nodes, each of each has high Page Ranks.

B.

The node has the only edge connecting its community to the rest of the graph.

C.

The node can be easily bypassed by communications taking other shorter paths.

D.

The node acts as the hub of the graph.

Question 3

Why would a company decide to use HBase to replace an existing relational database?

Options:

A.

It is required for performing ad-hoc queries.

B.

Varying formats of input data requires columns to be added in real time.

C.

The company's employees are already fluent in SQL.

D.

Existing SQL code will run unchanged on HBase.

Question 4

A simul-ation to compare two different sales models yields different results for the same set of input variables in different runs.

What is the likely cause?

Options:

A.

bit operating system was used

B.

The same number of trials was used.

C.

A linear congruenlial generator (LCG) was used for pseudo-random number generation.

D.

Different seeds forthe random number generator were used

Question 5

A marketing team creates a graph using a square for each data point, where the length of each side is set to the data value. The data values are 10 and 20.

What is the lie factor of the graph?

Options:

A.

1

B.

2

C.

3

D.

6

Question 6

Given an input vector of features, a Random Forests model performs a classification task and ends in a tie.

How does the model handle this outcome?

Options:

A.

The model will be rebuilt

B.

A winner is chosen at random

C.

The tree that caused the tie is discarded

D.

One more tree is added to the forest

Question 7

Which representation is most suitable for a small and highly connected network?

Options:

A.

Edge list

B.

Adjacency matrix

C.

Eigenvector centrality

D.

Adjacency list

Question 8

What is a key beneficial characteristic of the Random Forest algorithm?

Options:

A.

Provides and explanatory model

B.

Distinguishes categorical from continuous variables

C.

Support for unstructured data

D.

Resiliency to complex, non-linear variable interactions

Question 9

Which scenario would be ideal for processing Hadoop data with Hive?

Options:

A.

Structured data, real-time processing

B.

Unstructured data; batch processing

C.

Unstructured data; real-time processing

D.

Structured data; batch processing

Demo: 9 questions
Total 66 questions