Online Access Free Databricks.Databricks-Certified-Professional-Data-Engineer.v2023-10-09.q24 Practice Test (Page 2)

Databricks-Certified-Professional-Data-Engineer Exam Question 1

A table in the Lakehouse namedcustomer_churn_paramsis used in churn prediction by the machine learning team. The table contains information about customers derived from a number of upstream sources. Currently, the data engineering team populates this table nightly by overwriting the table with the current valid values derived from upstream data sources.
The churn prediction model used by the ML team is fairly stable in production. The team is only interested in making predictions on records that have changed in the past 24 hours.
Which approach would simplify the identification of these changed records?

A.Apply the churn model to all rows in the customer_churn_params table, but implement logic to perform an upsert into the predictions table that ignores rows where predictions have not changed.

B.Convert the batch job to a Structured Streaming job using the complete output mode; configure a Structured Streaming job to read from the customer_churn_params table and incrementally predict against the churn model.

C.Calculate the difference between the previous model predictions and the current customer_churn_params on a key identifying unique customers before making new predictions; only make predictions on those customers not in the previous predictions.

D.Modify the overwrite logic to include a field populated by calling
spark.sql.functions.current_timestamp() as data are being written; use this field to identify records written on a particular date.

E.Replace the current overwrite logic with a merge statement to modify only those records that have changed; write logic to make predictions on the changed records identified by the change data feed.

Databricks-Certified-Professional-Data-Engineer Exam Question 2

A user new to Databricks is trying to troubleshoot long execution times for some pipeline logic they are working on. Presently, the user is executing code cell-by-cell, using calls to confirm code is producing the logically correct results as new transformations are added to an operation. To get a measure of average time to execute, the user is running each cell multiple times interactively.
Which of the following adjustments will get a more accurate measure of how code is likely to perform in production?

A.Scala is the only language that can be accurately tested using interactive notebooks; because the best performance is achieved by using Scala code compiled to JARs. all PySpark and Spark SQL logic should be refactored.

B.The only way to meaningfully troubleshoot code execution times in development notebooks Is to use production-sized data and production-sized clusters with Run All execution.

C.Production code development should only be done using an IDE; executing code against a local build of open source Spark and Delta Lake will provide the most accurate benchmarks for how code will perform in production.

D.Calling display () forces a job to trigger, while many transformations will only add to the logical query plan; because of caching, repeated execution of the same logic does not provide meaningful results.

E.The Jobs Ul should be leveraged to occasionally run the notebook as a job and track execution time during incremental code development because Photon can only be enabled on clusters launched for scheduled jobs.

Databricks-Certified-Professional-Data-Engineer Exam Question 3

The data science team has created and logged a production model using MLflow. The following code correctly imports and applies the production model to output the predictions as a new DataFrame namedpredswith the schema "customer_id LONG, predictions DOUBLE, date DATE".

The data science team would like predictions saved to a Delta Lake table with the ability to compare all predictions across time. Churn predictions will be made at most once per day.
Which code block accomplishes this task while minimizing potential compute costs?

A.preds.write.mode("append").saveAsTable("churn_preds")

B.preds.write.format("delta").save("/preds/churn_preds")

Databricks-Certified-Professional-Data-Engineer Exam Question 4

A junior data engineer has been asked to develop a streaming data pipeline with a grouped aggregation using DataFramedf. The pipeline needs to calculate the average humidity and average temperature for each non-overlapping five-minute interval. Events are recorded once per minute per device.
Streaming DataFramedfhas the following schema:
"device_id INT, event_time TIMESTAMP, temp FLOAT, humidity FLOAT"
Code block:

Choose the response that correctly fills in the blank within the code block to complete this task.

A.to_interval("event_time", "5 minutes").alias("time")

B.window("event_time", "5 minutes").alias("time")

C."event_time"

D.window("event_time", "10 minutes").alias("time")

E.lag("event_time", "10 minutes").alias("time")

Databricks-Certified-Professional-Data-Engineer Exam Question 5

A Delta Lake table was created with the below query:

Realizing that the original query had a typographical error, the below code was executed:
ALTER TABLE prod.sales_by_stor RENAME TO prod.sales_by_store
Which result will occur after running the second command?

A.The table reference in the metastore is updated and no data is changed.

B.The table name change is recorded in the Delta transaction log.

C.All related files and metadata are dropped and recreated in a single ACID transaction.

D.The table reference in the metastore is updated and all data files are moved.

E.A new Delta transaction log Is created for the renamed table.

Other Version: 562Databricks.Databricks-Certified-Professional-Data-Engineer.v2025-06-26.q60; 1247Databricks.Databricks-Certified-Professional-Data-Engineer.v2024-11-21.q117; 1352Databricks.Databricks-Certified-Professional-Data-Engineer.v2023-03-10.q102; 613Databricks.Databricks-Certified-Professional-Data-Engineer.v2023-02-09.q22; 884Databricks.Databricks-Certified-Professional-Data-Engineer.v2022-09-06.q21

Latest Upload: 107Salesforce.ADM-201.v2025-09-06.q260; 103Oracle.1Z0-1055-23.v2025-09-06.q48; 104Cisco.010-151.v2025-09-06.q130; 130PMI.PMI-PBA.v2025-09-05.q160; 138Salesforce.Advanced-Cross-Channel.v2025-09-04.q41; 133EC-COUNCIL.312-40.v2025-09-04.q87; 147Oracle.1Z1-591.v2025-09-03.q129; 128Oracle.1z0-1065-25.v2025-09-03.q26; 162Cisco.350-701.v2025-09-03.q218; 125Oracle.1Z0-1195-25.v2025-09-03.q20

Databricks-Certified-Professional-Data-Engineer Exam Question 1

Databricks-Certified-Professional-Data-Engineer Exam Question 2

Databricks-Certified-Professional-Data-Engineer Exam Question 3

Databricks-Certified-Professional-Data-Engineer Exam Question 4

Databricks-Certified-Professional-Data-Engineer Exam Question 5

Download PDF File