Professional-Data-Engineer Exam Question 1

You are designing the architecture of your application to store data in Cloud Storage. Your application consists of pipelines that read data from a Cloud Storage bucket that contains raw data, and write the data to a second bucket after processing. You want to design an architecture with Cloud Storage resources that are capable of being resilient if a Google Cloud regional failure occurs. You want to minimize the recovery point objective (RPO) if a failure occurs, with no impact on applications that use the stored dat a. What should you do?
  • Professional-Data-Engineer Exam Question 2

    Which Cloud Dataflow / Beam feature should you use to aggregate data in an unbounded data source every hour based on the time when the data entered the pipeline?
  • Professional-Data-Engineer Exam Question 3

    You want to analyze hundreds of thousands of social media posts daily at the lowest cost and with the fewest steps.
    You have the following requirements:
    You will batch-load the posts once per day and run them through the Cloud Natural Language API.
    You will extract topics and sentiment from the posts.
    You must store the raw posts for archiving and reprocessing.
    You will create dashboards to be shared with people both inside and outside your organization.
    You need to store both the data extracted from the API to perform analysis as well as the raw social media posts for historical archiving. What should you do?
  • Professional-Data-Engineer Exam Question 4

    You are using BigQuery with a regional dataset that includes a table with the daily sales volumes. This table is updated multiple times per day. You need to protect your sales table in case of regional failures with a recovery point objective (RPO) of less than 24 hours, while keeping costs to a minimum. What should you do?
  • Professional-Data-Engineer Exam Question 5

    You are implementing workflow pipeline scheduling using open source-based tools and Google Kubernetes Engine (GKE). You want to use a Google managed service to simplify and automate the task. You also want to accommodate Shared VPC networking considerations. What should you do?