New Year Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70percent

Databricks Databricks-Certified-Data-Analyst-Associate Databricks Certified Data Analyst Associate Exam Exam Practice Test

Databricks Certified Data Analyst Associate Exam Questions and Answers

Question 1

Consider the following two statements:

Statement 1:

Statement 2:

Which of the following describes how the result sets will differ for each statement when they are run in Databricks SQL?

Options:

A.

The first statement will return all data from the customers table and matching data from the orders table. The second statement will return all data from the orders table and matching data from the customers table. Any missing data will be filled in with NULL.

B.

When the first statement is run, only rows from the customers table that have at least one match with the orders table on customer_id will be returned. When the second statement is run, only those rows in the customers table that do not have at least one match with the orders table on customer_id will be returned.

C.

There is no difference between the result sets for both statements.

D.

Both statements will fail because Databricks SQL does not support those join types.

E.

When the first statement is run, all rows from the customers table will be returned and only the customer_id from the orders table will be returned. When the second statement is run, only those rows in the customers table that do not have at least one match with the orders table on customer_id will be returned.

Question 2

A data engineering team has created a Structured Streaming pipeline that processes data in micro-batches and populates gold-level tables. The microbatches are triggered every minute.

A data analyst has created a dashboard based on this gold-level data. The project stakeholders want to see the results in the dashboard updated within one minute or less of new data becoming available within the gold-level tables.

Which of the following cautions should the data analyst share prior to setting up the dashboard to complete this task?

Options:

A.

The required compute resources could be costly

B.

The gold-level tables are not appropriately clean for business reporting

C.

The streaming data is not an appropriate data source for a dashboard

D.

The streaming cluster is not fault tolerant

E.

The dashboard cannot be refreshed that quickly

Question 3

A data analyst is attempting to drop a table my_table. The analyst wants to delete all table metadata and data.

They run the following command:

DROP TABLE IF EXISTS my_table;

While the object no longer appears when they run SHOW TABLES, the data files still exist.

Which of the following describes why the data files still exist and the metadata files were deleted?

Options:

A.

The table's data was larger than 10 GB

B.

The table did not have a location

C.

The table was external

D.

The table's data was smaller than 10 GB

E.

The table was managed

Question 4

A data analyst has been asked to use the below tablesales_tableto get the percentage rank of products within region by the sales:

The result of the query should look like this:

Which of the following queries will accomplish this task?

A)

B)

C)

D)

Options:

A.

Option A

B.

Option B

C.

Option C

D.

Option D

Question 5

Delta Lake stores table data as a series of data files, but it also stores a lot of other information.

Which of the following is stored alongside data files when using Delta Lake?

Options:

A.

None of these

B.

Table metadata, data summary visualizations, and owner account information

C.

Table metadata

D.

Data summary visualizations

E.

Owner account information

Question 6

A data analysis team is working with the table_bronze SQL table as a source for one of its most complex projects. A stakeholder of the project notices that some of the downstream data is duplicative. The analysis team identifies table_bronze as the source of the duplication.

Which of the following queries can be used to deduplicate the data from table_bronze and write it to a new table table_silver?

A)

CREATE TABLE table­_silver AS

SELECT DISTINCT *

FROM table_bronze;

B)

CREATE TABLE table_silver AS

INSERT *

FROM table_bronze;

C)

CREATE TABLE table_silver AS

MERGE DEDUPLICATE *

FROM table_bronze;

D)

INSERT INTO TABLE table_silver

SELECT * FROM table_bronze;

E)

INSERT OVERWRITE TABLE table_silver

SELECT * FROM table_bronze;

Options:

A.

Option A

B.

Option B

C.

Option C

D.

Option D

E.

Option E

Question 7

A data analyst creates a Databricks SQL Query where the result set has the following schema:

region STRING

number_of_customer INT

When the analyst clicks on the "Add visualization" button on the SQL Editor page, which of the following types of visualizations will be selected by default?

Options:

A.

Violin Chart

B.

Line Chart

C.

IBar Chart

D.

Histogram

E.

There is no default. The user must choose a visualization type.

Question 8

A data analyst has set up a SQL query to run every four hours on a SQL endpoint, but the SQL endpoint is taking too long to start up with each run.

Which of the following changes can the data analyst make to reduce the start-up time for the endpoint while managing costs?

Options:

A.

Reduce the SQL endpoint cluster size

B.

Increase the SQL endpoint cluster size

C.

Turn off the Auto stop feature

D.

Increase the minimum scaling value

E.

Use a Serverless SQL endpoint

Question 9

Which of the following statements about adding visual appeal to visualizations in the Visualization Editor is incorrect?

Options:

A.

Visualization scale can be changed.

B.

Data Labels can be formatted.

C.

Colors can be changed.

D.

Borders can be added.

E.

Tooltips can be formatted.

Question 10

Which of the following should data analysts consider when working with personally identifiable information (PII) data?

Options:

A.

Organization-specific best practices for Pll data

B.

Legal requirements for the area in which the data was collected

C.

None of these considerations

D.

Legal requirements for the area in which the analysis is being performed

E.

All of these considerations

Question 11

In which of the following situations should a data analyst use higher-order functions?

Options:

A.

When custom logic needs to be applied to simple, unnested data

B.

When custom logic needs to be converted to Python-native code

C.

When custom logic needs to be applied at scale to array data objects

D.

When built-in functions are taking too long to perform tasks

E.

When built-in functions need to run through the Catalyst Optimizer

Question 12

Which of the following statements about a refresh schedule is incorrect?

Options:

A.

A query can be refreshed anywhere from 1 minute lo 2 weeks

B.

Refresh schedules can be configured in the Query Editor.

C.

A query being refreshed on a schedule does not use a SQL Warehouse (formerly known as SQL Endpoint).

D.

A refresh schedule is not the same as an alert.

E.

You must have workspace administrator privileges to configure a refresh schedule

Question 13

A data analyst wants to create a dashboard with three main sections: Development, Testing, and Production. They want all three sections on the same dashboard, but they want to clearly designate the sections using text on the dashboard.

Which of the following tools can the data analyst use to designate the Development, Testing, and Production sections using text?

Options:

A.

Separate endpoints for each section

B.

Separate queries for each section

C.

Markdown-based text boxes

D.

Direct text written into the dashboard in editing mode

E.

Separate color palettes for each section