Associate-Developer-Apache-Spark-3.5 Exam Question 11

Given a CSV file with the content:

And the following code:
from pyspark.sql.types import *
schema = StructType([
StructField("name", StringType()),
StructField("age", IntegerType())
])
spark.read.schema(schema).csv(path).collect()
What is the resulting output?
  • Associate-Developer-Apache-Spark-3.5 Exam Question 12

    An MLOps engineer is building a Pandas UDF that applies a language model that translates English strings into Spanish. The initial code is loading the model on every call to the UDF, which is hurting the performance of the data pipeline.
    The initial code is:

    def in_spanish_inner(df: pd.Series) -> pd.Series:
    model = get_translation_model(target_lang='es')
    return df.apply(model)
    in_spanish = sf.pandas_udf(in_spanish_inner, StringType())
    How can the MLOps engineer change this code to reduce how many times the language model is loaded?
  • Associate-Developer-Apache-Spark-3.5 Exam Question 13

    Which UDF implementation calculates the length of strings in a Spark DataFrame?
  • Associate-Developer-Apache-Spark-3.5 Exam Question 14

    A Spark engineer is troubleshooting a Spark application that has been encountering out-of-memory errors during execution. By reviewing the Spark driver logs, the engineer notices multiple "GC overhead limit exceeded" messages.
    Which action should the engineer take to resolve this issue?
  • Associate-Developer-Apache-Spark-3.5 Exam Question 15

    26 of 55.
    A data scientist at an e-commerce company is working with user data obtained from its subscriber database and has stored the data in a DataFrame df_user.
    Before further processing, the data scientist wants to create another DataFrame df_user_non_pii and store only the non-PII columns.
    The PII columns in df_user are name, email, and birthdate.
    Which code snippet can be used to meet this requirement?