Big Halloween Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70percent

Databricks Databricks-Certified-Data-Engineer-Associate Databricks Certified Data Engineer Associate Exam Exam Practice Test

Databricks Certified Data Engineer Associate Exam Questions and Answers

Question 1

Which of the following commands will return the location of database customer360?

Options:

A.

DESCRIBE LOCATION customer360;

B.

DROP DATABASE customer360;

C.

DESCRIBE DATABASE customer360;

D.

ALTER DATABASE customer360 SET DBPROPERTIES ('location' = '/user'};

E.

USE DATABASE customer360;

Question 2

Which tool is used by Auto Loader to process data incrementally?

Options:

A.

Spark Structured Streaming

B.

Unity Catalog

C.

Checkpointing

D.

Databricks SQL

Question 3

A data engineer needs to apply custom logic to string column city in table stores for a specific use case. In order to apply this custom logic at scale, the data engineer wants to create a SQL user-defined function (UDF).

Which of the following code blocks creates this SQL UDF?

Options:

A.

B.

C.

D.

E.

Question 4

A new data engineering team has been assigned to work on a project. The team will need access to database customers in order to see what tables already exist. The team has its own group team.

Which of the following commands can be used to grant the necessary permission on the entire database to the new team?

Options:

A.

GRANT VIEW ON CATALOG customers TO team;

B.

GRANT CREATE ON DATABASE customers TO team;

C.

GRANT USAGE ON CATALOG team TO customers;

D.

GRANT CREATE ON DATABASE team TO customers;

E.

GRANT USAGE ON DATABASE customers TO team;

Question 5

A new data engineering team team has been assigned to an ELT project. The new data engineering team will need full privileges on the table sales to fully manage the project.

Which command can be used to grant full permissions on the database to the new data engineering team?

Options:

A.

grant all privileges on table sales TO team;

B.

GRANT SELECT ON TABLE sales TO team;

C.

GRANT SELECT CREATE MODIFY ON TABLE sales TO team;

D.

GRANT ALL PRIVILEGES ON TABLE team TO sales;

Question 6

A data engineer is maintaining a data pipeline. Upon data ingestion, the data engineer notices that the source data is starting to have a lower level of quality. The data engineer would like to automate the process of monitoring the quality level.

Which of the following tools can the data engineer use to solve this problem?

Options:

A.

Unity Catalog

B.

Data Explorer

C.

Delta Lake

D.

Delta Live Tables

E.

Auto Loader

Question 7

Which of the following is hosted completely in the control plane of the classic Databricks architecture?

Options:

A.

Worker node

B.

JDBC data source

C.

Databricks web application

D.

Databricks Filesystem

E.

Driver node

Question 8

A data engineer is attempting to drop a Spark SQL table my_table and runs the following command:

DROP TABLE IF EXISTS my_table;

After running this command, the engineer notices that the data files and metadata files have been deleted from the file system.

Which of the following describes why all of these files were deleted?

Options:

A.

The table was managed

B.

The table's data was smaller than 10 GB

C.

The table's data was larger than 10 GB

D.

The table was external

E.

The table did not have a location

Question 9

In which of the following scenarios should a data engineer use the MERGE INTO command instead of the INSERT INTO command?

Options:

A.

When the location of the data needs to be changed

B.

When the target table is an external table

C.

When the source table can be deleted

D.

When the target table cannot contain duplicate records

E.

When the source is not a Delta table

Question 10

A data engineer wants to create a new table containing the names of customers who live in France.

They have written the following command:

CREATE TABLE customersInFrance

_____ AS

SELECT id,

firstName,

lastName

FROM customerLocations

WHERE country = ’FRANCE’;

A senior data engineer mentions that it is organization policy to include a table property indicating that the new table includes personally identifiable information (Pll).

Which line of code fills in the above blank to successfully complete the task?

Options:

A.

COMMENT "Contains PIT

B.

511

C.

"COMMENT PII"

D.

TBLPROPERTIES PII

Question 11

Which file format is used for storing Delta Lake Table?

Options:

A.

Parquet

B.

Delta

C.

SV

D.

JSON

Question 12

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The code block used by the data engineer is below:

If the data engineer only wants the query to process all of the available data in as many batches as required, which of the following lines of code should the data engineer use to fill in the blank?

Options:

A.

processingTime(1)

B.

trigger(availableNow=True)

C.

trigger(parallelBatch=True)

D.

trigger(processingTime="once")

E.

trigger(continuous="once")

Question 13

What is stored in a Databricks customer's cloud account?

Options:

A.

Data

B.

Cluster management metadata

C.

Databricks web application

D.

Notebooks

Question 14

Identify how the count_if function and the count where x is null can be used

Consider a table random_values with below data.

What would be the output of below query?

select count_if(col > 1) as count_a. count(*) as count_b.count(col1) as count_c from random_values col1

0

1

2

NULL -

2

3

Options:

A.

3 6 5

B.

4 6 5

C.

3 6 6

D.

4 6 6

Question 15

Which of the following must be specified when creating a new Delta Live Tables pipeline?

Options:

A.

A key-value pair configuration

B.

The preferred DBU/hour cost

C.

A path to cloud storage location for the written data

D.

A location of a target database for the written data

E.

At least one notebook library to be executed

Question 16

A data architect has determined that a table of the following format is necessary:

Which of the following code blocks uses SQL DDL commands to create an empty Delta table in the above format regardless of whether a table already exists with this name?

Options:

A.

Option A

B.

Option B

C.

Option C

D.

Option D

E.

Option E

Question 17

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The code block used by the data engineer is below:

Which line of code should the data engineer use to fill in the blank if the data engineer only wants the query to execute a micro-batch to process data every 5 seconds?

Options:

A.

trigger("5 seconds")

B.

trigger(continuous="5 seconds")

C.

trigger(once="5 seconds")

D.

trigger(processingTime="5 seconds")

Question 18

A data analyst has a series of queries in a SQL program. The data analyst wants this program to run every day. They only want the final query in the program to run on Sundays. They ask for help from the data engineering team to complete this task.

Which of the following approaches could be used by the data engineering team to complete this task?

Options:

A.

They could submit a feature request with Databricks to add this functionality.

B.

They could wrap the queries using PySpark and use Python’s control flow system to determine when to run the final query.

C.

They could only run the entire program on Sundays.

D.

They could automatically restrict access to the source table in the final query so that it is only accessible on Sundays.

E.

They could redesign the data model to separate the data used in the final query into a new table.

Question 19

A data engineer has been using a Databricks SQL dashboard to monitor the cleanliness of the input data to a data analytics dashboard for a retail use case. The job has a Databricks SQL query that returns the number of store-level records where sales is equal to zero. The data engineer wants their entire team to be notified via a messaging webhook whenever this value is greater than 0.

Which of the following approaches can the data engineer use to notify their entire team via a messaging webhook whenever the number of stores with $0 in sales is greater than zero?

Options:

A.

They can set up an Alert with a custom template.

B.

They can set up an Alert with a new email alert destination.

C.

They can set up an Alert with one-time notifications.

D.

They can set up an Alert with a new webhook alert destination.

E.

They can set up an Alert without notifications.

Question 20

Which of the following Git operations must be performed outside of Databricks Repos?

Options:

A.

Commit

B.

Pull

C.

Push

D.

Clone

E.

Merge

Question 21

A dataset has been defined using Delta Live Tables and includes an expectations clause:

CONSTRAINT valid_timestamp EXPECT (timestamp > '2020-01-01') ON VIOLATION DROP ROW

What is the expected behavior when a batch of data containing data that violates these constraints is processed?

Options:

A.

Records that violate the expectation are dropped from the target dataset and loaded into a quarantine table.

B.

Records that violate the expectation are added to the target dataset and flagged as invalid in a field added to the target dataset.

C.

Records that violate the expectation are dropped from the target dataset and recorded as invalid in the event log.

D.

Records that violate the expectation are added to the target dataset and recorded as invalid in the event log.

E.

Records that violate the expectation cause the job to fail.

Question 22

Which of the following describes a scenario in which a data engineer will want to use a single-node cluster?

Options:

A.

When they are working interactively with a small amount of data

B.

When they are running automated reports to be refreshed as quickly as possible

C.

When they are working with SQL within Databricks SQL

D.

When they are concerned about the ability to automatically scale with larger data

E.

When they are manually running reports with a large amount of data

Question 23

Which of the following commands can be used to write data into a Delta table while avoiding the writing of duplicate records?

Options:

A.

DROP

B.

IGNORE

C.

MERGE

D.

APPEND

E.

INSERT

Question 24

Which of the following describes a scenario in which a data team will want to utilize cluster pools?

Options:

A.

An automated report needs to be refreshed as quickly as possible.

B.

An automated report needs to be made reproducible.

C.

An automated report needs to be tested to identify errors.

D.

An automated report needs to be version-controlled across multiple collaborators.

E.

An automated report needs to be runnable by all stakeholders.

Question 25

A data engineer has a Job that has a complex run schedule, and they want to transfer that schedule to other Jobs.

Rather than manually selecting each value in the scheduling form in Databricks, which of the following tools can the data engineer use to represent and submit the schedule programmatically?

Options:

A.

pyspark.sql.types.DateType

B.

datetime

C.

pyspark.sql.types.TimestampType

D.

Cron syntax

E.

There is no way to represent and submit this information programmatically

Question 26

In order for Structured Streaming to reliably track the exact progress of the processing so that it can handle any kind of failure by restarting and/or reprocessing, which of the following two approaches is used by Spark to record the offset range of the data being processed in each trigger?

Options:

A.

Checkpointing and Write-ahead Logs

B.

Structured Streaming cannot record the offset range of the data being processed in each trigger.

C.

Replayable Sources and Idempotent Sinks

D.

Write-ahead Logs and Idempotent Sinks

E.

Checkpointing and Idempotent Sinks

Question 27

The Delta transaction log for the ‘students’ tables is shown using the ‘DESCRIBE HISTORY students’ command. A Data Engineer needs to query the table as it existed before the UPDATE operation listed in the log.

Which command should the Data Engineer use to achieve this? (Choose two.)

Options:

A.

SELECT * FROM students@v4

B.

SELECT * FROM students TIMESTAMP AS OF ‘2024-04-22T 14:32:47.000+00:00’

C.

SELECT * FROM students FROM HISTORY VERSION AS OF 3

D.

SELECT * FROM students VERSION AS OF 5

E.

SELECT * FROM students TIMESTAMP AS OF ‘2024-04-22T 14:32:58.000+00:00’

Question 28

Which of the following commands will return the number of null values in the member_id column?

Options:

A.

SELECT count(member_id) FROM my_table;

B.

SELECT count(member_id) - count_null(member_id) FROM my_table;

C.

SELECT count_if(member_id IS NULL) FROM my_table;

D.

SELECT null(member_id) FROM my_table;

E.

SELECT count_null(member_id) FROM my_table;

Question 29

A data engineer wants to create a new table containing the names of customers that live in France.

They have written the following command:

A senior data engineer mentions that it is organization policy to include a table property indicating that the new table includes personally identifiable information (PII).

Which of the following lines of code fills in the above blank to successfully complete the task?

Options:

A.

There is no way to indicate whether a table contains PII.

B.

"COMMENT PII"

C.

TBLPROPERTIES PII

D.

COMMENT "Contains PII"

E.

PII

Question 30

Which of the following describes the storage organization of a Delta table?

Options:

A.

Delta tables are stored in a single file that contains data, history, metadata, and other attributes.

B.

Delta tables store their data in a single file and all metadata in a collection of files in a separate location.

C.

Delta tables are stored in a collection of files that contain data, history, metadata, and other attributes.

D.

Delta tables are stored in a collection of files that contain only the data stored within the table.

E.

Delta tables are stored in a single file that contains only the data stored within the table.

Question 31

Which of the following approaches should be used to send the Databricks Job owner an email in the case that the Job fails?

Options:

A.

Manually programming in an alert system in each cell of the Notebook

B.

Setting up an Alert in the Job page

C.

Setting up an Alert in the Notebook

D.

There is no way to notify the Job owner in the case of Job failure

E.

MLflow Model Registry Webhooks

Question 32

A data engineer is attempting to drop a Spark SQL table my_table. The data engineer wants to delete all table metadata and data.

They run the following command:

DROP TABLE IF EXISTS my_table

While the object no longer appears when they run SHOW TABLES, the data files still exist.

Which of the following describes why the data files still exist and the metadata files were deleted?

Options:

A.

The table’s data was larger than 10 GB

B.

The table’s data was smaller than 10 GB

C.

The table was external

D.

The table did not have a location

E.

The table was managed