Online Access Free Databricks.Associate-Developer-Apache-Spark-3.5.v2025-11-26.q35 Practice Test (Page 6)

Associate-Developer-Apache-Spark-3.5 Exam Question 21

Given this code:

.withWatermark("event_time","10 minutes")
.groupBy(window("event_time","15 minutes"))
.count()
What happens to data that arrives after the watermark threshold?
Options:

A.Records that arrive later than the watermark threshold (10 minutes) will automatically be included in the aggregation if they fall within the 15-minute window.

B.Any data arriving more than 10 minutes after the watermark threshold will be ignored and not included in the aggregation.

C.Data arriving more than 10 minutes after the latest watermark will still be included in the aggregation but will be placed into the next window.

D.The watermark ensures that late data arriving within 10 minutes of the latest event_time will be processed and included in the windowed aggregation.

Associate-Developer-Apache-Spark-3.5 Exam Question 22

A data engineer is streaming data from Kafka and requires:
Minimal latency
Exactly-once processing guarantees
Which trigger mode should be used?

A..trigger(processingTime='1 second')

B..trigger(continuous=True)

C..trigger(continuous='1 second')

D..trigger(availableNow=True)

Associate-Developer-Apache-Spark-3.5 Exam Question 23

A developer is trying to join two tables,sales.purchases_fctandsales.customer_dim, using the following code:

fact_df = purch_df.join(cust_df, F.col('customer_id') == F.col('custid')) The developer has discovered that customers in thepurchases_fcttable that do not exist in thecustomer_dimtable are being dropped from the joined table.
Which change should be made to the code to stop these customer records from being dropped?

A.fact_df = purch_df.join(cust_df, F.col('customer_id') == F.col('custid'), 'left')

B.fact_df = cust_df.join(purch_df, F.col('customer_id') == F.col('custid'))

C.fact_df = purch_df.join(cust_df, F.col('cust_id') == F.col('customer_id'))

D.fact_df = purch_df.join(cust_df, F.col('customer_id') == F.col('custid'), 'right_outer')

Associate-Developer-Apache-Spark-3.5 Exam Question 24

A developer is running Spark SQL queries and notices underutilization of resources. Executors are idle, and the number of tasks per stage is low.
What should the developer do to improve cluster utilization?

A.Increase the value of spark.sql.shuffle.partitions

B.Reduce the value of spark.sql.shuffle.partitions

C.Increase the size of the dataset to create more partitions

D.Enable dynamic resource allocation to scale resources as needed

Associate-Developer-Apache-Spark-3.5 Exam Question 25

Given the code:

df = spark.read.csv("large_dataset.csv")
filtered_df = df.filter(col("error_column").contains("error"))
mapped_df = filtered_df.select(split(col("timestamp")," ").getItem(0).alias("date"), lit(1).alias("count")) reduced_df = mapped_df.groupBy("date").sum("count") reduced_df.count() reduced_df.show() At which point will Spark actually begin processing the data?

A.When the filter transformation is applied

B.When the count action is applied

C.When the groupBy transformation is applied

D.When the show action is applied

Latest Upload: 103SAP.C_THR84_2505.v2026-01-12.q37; 103Salesforce.CRT-261.v2026-01-12.q83; 146Microsoft.SC-400.v2026-01-11.q164; 122SAP.C_THR88_2505.v2026-01-11.q67; 126CIPS.L4M6.v2026-01-11.q106; 114SAP.C_S4CS_2502.v2026-01-11.q35; 135Lpi.101-500.v2026-01-11.q128; 110Salesforce.Health-Cloud-Accredited-Professional.v2026-01-10.q45; 162Microsoft.AZ-900.v2026-01-10.q234; 140VMware.3V0-32.23.v2026-01-10.q133

Associate-Developer-Apache-Spark-3.5 Exam Question 21

Associate-Developer-Apache-Spark-3.5 Exam Question 22

Associate-Developer-Apache-Spark-3.5 Exam Question 23

Associate-Developer-Apache-Spark-3.5 Exam Question 24

Associate-Developer-Apache-Spark-3.5 Exam Question 25

Download PDF File