CDP-3002 Exam Question 71
Which of the following is a critical consideration when deciding between using a sort merge join and a shuffle hash join in a distributed data processing system like Spark?
CDP-3002 Exam Question 72
In Apache Airflow, which operator is best suited for running data quality checks on a Hive table after data ingestion?
CDP-3002 Exam Question 73
In an Airflow DAG, you have tasks A, B, C, and D. Task A must complete before B and C can start, but B and C can run in parallel. Task D should only run once both B and C have completed. How do you set up these dependencies?
CDP-3002 Exam Question 74
You need to enable secure access to Iceberg tables in CDP, controlling permissions at the table, column, and row level. Which of the following approaches would you investigate?
CDP-3002 Exam Question 75
Discuss the trade-offs between using wide tables (many columns) and narrow tables (few columns) in Spark and the implications for data processing efficiency.

