Associate-Developer-Apache-Spark-3.5 Exam Question 31

A data engineer writes the following code to join two DataFramesdf1anddf2:
df1 = spark.read.csv("sales_data.csv") # ~10 GB
df2 = spark.read.csv("product_data.csv") # ~8 MB
result = df1.join(df2, df1.product_id == df2.product_id)

Which join strategy will Spark use?
  • Associate-Developer-Apache-Spark-3.5 Exam Question 32

    An MLOps engineer is building a Pandas UDF that applies a language model that translates English strings into Spanish. The initial code is loading the model on every call to the UDF, which is hurting the performance of the data pipeline.
    The initial code is:

    def in_spanish_inner(df: pd.Series) -> pd.Series:
    model = get_translation_model(target_lang='es')
    return df.apply(model)
    in_spanish = sf.pandas_udf(in_spanish_inner, StringType())
    How can the MLOps engineer change this code to reduce how many times the language model is loaded?
  • Associate-Developer-Apache-Spark-3.5 Exam Question 33

    A DataFramedfhas columnsname,age, andsalary. The developer needs to sort the DataFrame byagein ascending order andsalaryin descending order.
    Which code snippet meets the requirement of the developer?
  • Associate-Developer-Apache-Spark-3.5 Exam Question 34

    A data engineer needs to persist a file-based data source to a specific location. However, by default, Spark writes to the warehouse directory (e.g., /user/hive/warehouse). To override this, the engineer must explicitly define the file path.
    Which line of code ensures the data is saved to a specific location?
    Options:
  • Associate-Developer-Apache-Spark-3.5 Exam Question 35

    Given this view definition:
    df.createOrReplaceTempView("users_vw")
    Which approach can be used to query the users_vw view after the session is terminated?
    Options: