Associate-Developer-Apache-Spark-3.5 Exam Question 31

4 of 55.
A developer is working on a Spark application that processes a large dataset using SQL queries. Despite having a large cluster, the developer notices that the job is underutilizing the available resources. Executors remain idle for most of the time, and logs reveal that the number of tasks per stage is very low. The developer suspects that this is causing suboptimal cluster performance.
Which action should the developer take to improve cluster utilization?
  • Associate-Developer-Apache-Spark-3.5 Exam Question 32

    15 of 55.
    A data engineer is working on a Streaming DataFrame (streaming_df) with the following streaming data:
    id
    name
    count
    timestamp
    1
    Delhi
    20
    2024-09-19T10:11
    1
    Delhi
    50
    2024-09-19T10:12
    2
    London
    50
    2024-09-19T10:15
    3
    Paris
    30
    2024-09-19T10:18
    3
    Paris
    20
    2024-09-19T10:20
    4
    Washington
    10
    2024-09-19T10:22
    Which operation is supported with streaming_df?
  • Associate-Developer-Apache-Spark-3.5 Exam Question 33

    A developer is running Spark SQL queries and notices underutilization of resources. Executors are idle, and the number of tasks per stage is low.
    What should the developer do to improve cluster utilization?
  • Associate-Developer-Apache-Spark-3.5 Exam Question 34

    A developer wants to test Spark Connect with an existing Spark application.
    What are the two alternative ways the developer can start a local Spark Connect server without changing their existing application code? (Choose 2 answers)
  • Associate-Developer-Apache-Spark-3.5 Exam Question 35

    7 of 55.
    A developer has been asked to debug an issue with a Spark application. The developer identified that the data being loaded from a CSV file is being read incorrectly into a DataFrame.
    The CSV file has been read using the following Spark SQL statement:
    CREATE TABLE locations
    USING csv
    OPTIONS (path '/data/locations.csv')
    The first lines of the command SELECT * FROM locations look like this:
    | city | lat | long |
    | ALTI Sydney | -33... | ... |
    Which parameter can the developer add to the OPTIONS clause in the CREATE TABLE statement to read the CSV data correctly again?