CDP-3002 Exam Question 71

Which of the following is a critical consideration when deciding between using a sort merge join and a shuffle hash join in a distributed data processing system like Spark?
  • CDP-3002 Exam Question 72

    In Apache Airflow, which operator is best suited for running data quality checks on a Hive table after data ingestion?
  • CDP-3002 Exam Question 73

    In an Airflow DAG, you have tasks A, B, C, and D. Task A must complete before B and C can start, but B and C can run in parallel. Task D should only run once both B and C have completed. How do you set up these dependencies?
  • CDP-3002 Exam Question 74

    You need to enable secure access to Iceberg tables in CDP, controlling permissions at the table, column, and row level. Which of the following approaches would you investigate?
  • CDP-3002 Exam Question 75

    Discuss the trade-offs between using wide tables (many columns) and narrow tables (few columns) in Spark and the implications for data processing efficiency.