Associate-Developer-Apache-Spark-3.5 Exam Question 1

What is the benefit of using Pandas on Spark for data transformations?
Options:
  • Associate-Developer-Apache-Spark-3.5 Exam Question 2

    A Spark application suffers from too many small tasks due to excessive partitioning. How can this be fixed without a full shuffle?
    Options:
  • Associate-Developer-Apache-Spark-3.5 Exam Question 3

    A data engineer is asked to build an ingestion pipeline for a set of Parquet files delivered by an upstream team on a nightly basis. The data is stored in a directory structure with a base path of "/path/events/data". The upstream team drops daily data into the underlying subdirectories following the convention year/month/day.
    A few examples of the directory structure are:

    Which of the following code snippets will read all the data within the directory structure?
  • Associate-Developer-Apache-Spark-3.5 Exam Question 4

    Which UDF implementation calculates the length of strings in a Spark DataFrame?
  • Associate-Developer-Apache-Spark-3.5 Exam Question 5

    Given:
    python
    CopyEdit
    spark.sparkContext.setLogLevel("<LOG_LEVEL>")
    Which set contains the suitable configuration settings for Spark driver LOG_LEVELs?