Online Access Free Associate-Developer-Apache-Spark-3.5 Exam Questions

Exam Code:	Associate-Developer-Apache-Spark-3.5
Exam Name:	Databricks Certified Associate Developer for Apache Spark 3.5 - Python
Certification Provider:	Databricks
Free Question Number:	135
Posted:	Jun 14, 2026

Rating

100%

Page: 1 / 27
Total 135 questions

Question 1

28 of 55.
A data analyst builds a Spark application to analyze finance data and performs the following operations:
filter, select, groupBy, and coalesce.
Which operation results in a shuffle?

A.select
B.coalesce
C.filter
D.groupBy

Question 2

42 of 55.
A developer needs to write the output of a complex chain of Spark transformations to a Parquet table called events.liveLatest.
Consumers of this table query it frequently with filters on both year and month of the event_ts column (a timestamp).
The current code:
from pyspark.sql import functions as F
final = df.withColumn("event_year", F.year("event_ts")) \
.withColumn("event_month", F.month("event_ts")) \
.bucketBy(42, ["event_year", "event_month"]) \
.saveAsTable("events.liveLatest")
However, consumers report poor query performance.
Which change will enable efficient querying by year and month?

A.Replace .bucketBy() with .partitionBy("event_year", "event_month")
B.Add .sortBy() after .bucketBy()
C.Replace .bucketBy() with .partitionBy("event_year") only
D.Change the bucket count (42) to a lower number

Question 3

49 of 55.
In the code block below, aggDF contains aggregations on a streaming DataFrame:
aggDF.writeStream \
.format("console") \
.outputMode("???") \
.start()
Which output mode at line 3 ensures that the entire result table is written to the console during each trigger execution?

A.COMPLETE
B.APPEND
C.REPLACE
D.AGGREGATE

Question 4

A data scientist at a financial services company is working with a Spark DataFrame containing transaction records. The DataFrame has millions of rows and includes columns for transaction_id, account_number, transaction_amount, and timestamp. Due to an issue with the source system, some transactions were accidentally recorded multiple times with identical information across all fields. The data scientist needs to remove rows with duplicates across all fields to ensure accurate financial reporting.
Which approach should the data scientist use to deduplicate the orders using PySpark?

A.df = df.dropDuplicates(["transaction_amount"])
B.df = df.dropDuplicates()
C.df = df.filter(F.col("transaction_id").isNotNull())
D.df = df.groupBy("transaction_id").agg(F.first("account_number"), F.first("transaction_amount"), F.first("timestamp"))

Question 5

The following code fragment results in an error:
@F.udf(T.IntegerType())
def simple_udf(t: str) -> str:
return answer * 3.14159
Which code fragment should be used instead?

A.@F.udf(T.IntegerType())
def simple_udf(t: int) -> int:
return t * 3.14159
B.@F.udf(T.IntegerType())
def simple_udf(t: float) -> float:
return t * 3.14159
C.@F.udf(T.DoubleType())
def simple_udf(t: float) -> float:
return t * 3.14159
D.@F.udf(T.DoubleType())
def simple_udf(t: int) -> int:
return t * 3.14159

Other Version: 584Databricks.Associate-Developer-Apache-Spark-3.5.v2026-03-02.q60; 476Databricks.Associate-Developer-Apache-Spark-3.5.v2025-11-26.q35

Latest Upload: 100ISTQB.CT-AI.v2026-06-18.q68; 221IIA.IIA-CIA-Part3.v2026-06-17.q220; 147WGU.Introduction-to-IT.v2026-06-17.q67; 194CompTIA.220-1202.v2026-06-16.q110; 128TheInstitutes.CPCU-500.v2026-06-16.q25; 205ACAMS.CAMS7-CN.v2026-06-16.q170; 221CBIC.CIC.v2026-06-15.q123; 141Peoplecert.ITIL-4-Specialist-High-velocity-IT.v2026-06-15.q16; 242HashiCorp.Terraform-Associate-004.v2026-06-15.q126; 149Peoplecert.ITILFNDv5.v2026-06-15.q26