Associate-Developer-Apache-Spark Exam Question 36

Which of the following code blocks uses a schema fileSchema to read a parquet file at location filePath into a DataFrame?
  • Associate-Developer-Apache-Spark Exam Question 37

    In which order should the code blocks shown below be run in order to return the number of records that are not empty in column value in the DataFrame resulting from an inner join of DataFrame transactionsDf and itemsDf on columns productId and itemId, respectively?
    1. .filter(~isnull(col('value')))
    2. .count()
    3. transactionsDf.join(itemsDf, col("transactionsDf.productId")==col("itemsDf.itemId"))
    4. transactionsDf.join(itemsDf, transactionsDf.productId==itemsDf.itemId, how='inner')
    5. .filter(col('value').isnotnull())
    6. .sum(col('value'))
  • Associate-Developer-Apache-Spark Exam Question 38

    Which of the following code blocks returns a DataFrame showing the mean value of column "value" of DataFrame transactionsDf, grouped by its column storeId?
  • Associate-Developer-Apache-Spark Exam Question 39

    Which of the following is one of the big performance advantages that Spark has over Hadoop?
  • Associate-Developer-Apache-Spark Exam Question 40

    Which of the following code blocks returns a 2-column DataFrame that shows the distinct values in column productId and the number of rows with that productId in DataFrame transactionsDf?