Online Access Free Databricks.Associate-Developer-Apache-Spark.v2022-05-26.q61 Practice Test (Page 6)

Associate-Developer-Apache-Spark Exam Question 21

Which of the following code blocks returns all unique values across all values in columns value and productId in DataFrame transactionsDf in a one-column DataFrame?

A.tranactionsDf.select('value').join(transactionsDf.select('productId'), col('value')==col('productId'),
'outer')

B.transactionsDf.select(col('value'), col('productId')).agg({'*': 'count'})

C.transactionsDf.select('value', 'productId').distinct()

D.transactionsDf.select('value').union(transactionsDf.select('productId')).distinct()

E.transactionsDf.agg({'value': 'collect_set', 'productId': 'collect_set'})

Associate-Developer-Apache-Spark Exam Question 22

Which of the following DataFrame operators is never classified as a wide transformation?

A.DataFrame.sort()

B.DataFrame.aggregate()

C.DataFrame.repartition()

D.DataFrame.select()

E.DataFrame.join()

Correct Answer: D

Explanation
As a general rule: After having gone through the practice tests you probably have a good feeling for what classifies as a wide and what classifies as a narrow transformation. If you are unsure, feel free to play around in Spark and display the explanation of the Spark execution plan via DataFrame.[operation, for example sort()].explain(). If repartitioning is involved, it would count as a wide transformation.
DataFrame.select()
Correct! A wide transformation includes a shuffle, meaning that an input partition maps to one or more output partitions. This is expensive and causes traffic across the cluster. With the select() operation however, you pass commands to Spark that tell Spark to perform an operation on a specific slice of any partition. For this, Spark does not need to exchange data across partitions, each partition can be worked on independently. Thus, you do not cause a wide transformation.
DataFrame.repartition()
Incorrect. When you repartition a DataFrame, you redefine partition boundaries. Data will flow across your cluster and end up in different partitions after the repartitioning is completed. This is known as a shuffle and, in turn, is classified as a wide transformation.
DataFrame.aggregate()
No. When you aggregate, you may compare and summarize data across partitions. In the process, data are exchanged across the cluster, and newly formed output partitions depend on one or more input partitions. This is a typical characteristic of a shuffle, meaning that the aggregate operation may classify as a wide transformation.
DataFrame.join()
Wrong. Joining multiple DataFrames usually means that large amounts of data are exchanged across the cluster, as new partitions are formed. This is a shuffle and therefore DataFrame.join() counts as a wide transformation.
DataFrame.sort()
False. When sorting, Spark needs to compare many rows across all partitions to each other. This is an expensive operation, since data is exchanged across the cluster and new partitions are formed as data is reordered. This process classifies as a shuffle and, as a result, DataFrame.sort() counts as wide transformation.
More info: Understanding Apache Spark Shuffle | Philipp Brunenberg

Associate-Developer-Apache-Spark Exam Question 23

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to transactionNumber. Find the error.
Code block:
transactionsDf.withColumn("transactionNumber", "transactionId")

A.The arguments to the withColumn method need to be reordered.

B.The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.

C.The copy() operator should be appended to the code block to ensure a copy is returned.

D.Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.

E.The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Associate-Developer-Apache-Spark Exam Question 24

Which of the following code blocks sorts DataFrame transactionsDf both by column storeId in ascending and by column productId in descending order, in this priority?

A.transactionsDf.sort("storeId", asc("productId"))

B.transactionsDf.sort(col(storeId)).desc(col(productId))

C.transactionsDf.order_by(col(storeId), desc(col(productId)))

D.transactionsDf.sort("storeId", desc("productId"))

E.transactionsDf.sort("storeId").sort(desc("productId"))

Associate-Developer-Apache-Spark Exam Question 25

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?

A.from pyspark import StorageLevel
transactionsDf.cache(StorageLevel.MEMORY_ONLY)

B.transactionsDf.cache()

C.transactionsDf.storage_level('MEMORY_ONLY')

D.transactionsDf.persist()

E.transactionsDf.clear_persist()

F.from pyspark import StorageLevel
transactionsDf.persist(StorageLevel.MEMORY_ONLY)

Other Version: 2178Databricks.Associate-Developer-Apache-Spark.v2022-08-12.q63; 2787Databricks.Associate-Developer-Apache-Spark.v2022-06-21.q62; 98Databricks.Validbraindumps.Associate-Developer-Apache-Spark.v2022-04-02.by.doreen.61q.pdf

Latest Upload: 135Oracle.1D0-1057-25-D.v2026-06-03.q29; 270NAHQ.CPHQ.v2026-06-03.q396; 252CompTIA.220-1201.v2026-06-03.q196; 155GIAC.GCFE.v2026-06-03.q78; 150HIMSS.CPHIMS.v2026-06-03.q45; 233Google.Professional-Cloud-Architect.v2026-06-03.q165; 153HP.HPE7-A09.v2026-06-02.q48; 164ACDIS.CCDS-O.v2026-06-02.q56; 138Microsoft.AB-730.v2026-06-02.q31; 211ASQ.CSSBB.v2026-06-02.q130

Associate-Developer-Apache-Spark Exam Question 21

Associate-Developer-Apache-Spark Exam Question 22

Associate-Developer-Apache-Spark Exam Question 23

Associate-Developer-Apache-Spark Exam Question 24

Associate-Developer-Apache-Spark Exam Question 25

Download PDF File