Databricks-Certified-Professional-Data-Engineer Exam Question 46

The data governance team is reviewing code used for deleting records for compliance with GDPR. They note the following logic is used to delete records from the Delta Lake table named users.

Assuming that user_id is a unique identifying key and that delete_requests contains all users that have requested deletion, which statement describes whether successfully executing the above logic guarantees that the records to be deleted are no longer accessible and why?
  • Databricks-Certified-Professional-Data-Engineer Exam Question 47

    The business intelligence team has a dashboard configured to track various summary metrics for retail stories.
    This includes total sales for the previous day alongside totals and averages for a variety of time periods. The fields required to populate this dashboard have the following schema:
    For Demand forecasting, the Lakehouse contains a validated table of all itemized sales updated incrementally in near real-time. This table named products_per_order, includes the following fields:
    Because reporting on long-term sales trends is less volatile, analysts using the new dashboard only require data to be refreshed once daily. Because the dashboard will be queried interactively by many users throughout a normal business day, it should return results quickly and reduce total compute associated with each materialization.
    Which solution meets the expectations of the end users while controlling and limiting possible costs?
  • Databricks-Certified-Professional-Data-Engineer Exam Question 48

    You noticed that a team member started using an all-purpose cluster to develop a notebook and used the same all-purpose cluster to set up a job that can run every 30 mins so they can update un-derlying tables which are used in a dashboard. What would you recommend for reducing the overall cost of this approach?
  • Databricks-Certified-Professional-Data-Engineer Exam Question 49

    The data engineering team has configured a job to process customer requests to be forgotten (have their data deleted). All user data that needs to be deleted is stored in Delta Lake tables using default table settings.
    The team has decided to process all deletions from the previous week as a batch job at 1am each Sunday. The total duration of this job is less than one hour. Every Monday at 3am, a batch job executes a series ofVACUUMcommands on all Delta Lake tables throughout the organization.
    The compliance officer has recently learned about Delta Lake's time travel functionality. They are concerned that this might allow continued access to deleted data.
    Assuming all delete logic is correctly implemented, which statement correctly addresses this concern?
  • Databricks-Certified-Professional-Data-Engineer Exam Question 50

    Which statement characterizes the general programming model used by Spark Structured Streaming?