* Snowflake supports cloning of various objects, such as databases, schemas, tables, stages, file formats, sequences, streams, tasks, and roles. Cloning creates a copy of an existing object in the system without copying the data or metadata. Cloning is also known as zero-copy cloning1. * Among the objects listed in the question, the following ones can be cloned in Snowflake: * Permanent table: A permanent table is a type of table that has a Fail-safe period and a Time Travel retention period of up to 90 days. A permanent table can be cloned using the CREATE TABLE ... CLONE command2. Therefore, option A is correct. * Transient table: A transient table is a type of table that does not have a Fail-safe period and can have a Time Travel retention period of either 0 or 1 day. A transient table can also be cloned using the CREATE TABLE ... CLONE command2. Therefore, option B is correct. * External table: An external table is a type of table that references data files stored in an external * location, such as Amazon S3, Google Cloud Storage, or Microsoft Azure Blob Storage. An external table can be cloned using the CREATE EXTERNAL TABLE ... CLONE command3. Therefore, option D is correct. * The following objects listed in the question cannot be cloned in Snowflake: * Temporary table: A temporary table is a type of table that is automatically dropped when the session ends or the current user logs out. Temporary tables do not support cloning4. Therefore, option C is incorrect. * Internal stage: An internal stage is a type of stage that is managed by Snowflake and stores files in Snowflake's internal cloud storage. Internal stages do not support cloning5. Therefore, option E is incorrect. References: : Cloning Considerations : CREATE TABLE ... CLONE : CREATE EXTERNAL TABLE ... CLONE : Temporary Tables : Internal Stages
ARA-C01 Exam Question 107
A table contains five columns and it has millions of records. The cardinality distribution of the columns is shown below: Column C4 and C5 are mostly used by SELECT queries in the GROUP BY and ORDER BY clauses. Whereas columns C1, C2 and C3 are heavily used in filter and join conditions of SELECT queries. The Architect must design a clustering key for this table to improve the query performance. Based on Snowflake recommendations, how should the clustering key columns be ordered while defining the multi-column clustering key?
Correct Answer: C
According to the Snowflake documentation, the following are some considerations for choosing clustering for a table1: Clustering is optimal when either: You require the fastest possible response times, regardless of cost. Your improved query performance offsets the credits required to cluster and maintain the table. Clustering is most effective when the clustering key is used in the following types of query predicates: Filter predicates (e.g. WHERE clauses) Join predicates (e.g. ON clauses) Grouping predicates (e.g. GROUP BY clauses) Sorting predicates (e.g. ORDER BY clauses) Clustering is less effective when the clustering key is not used in any of the above query predicates, or when the clustering key is used in a predicate that requires a function or expression to be applied to the key (e.g. DATE_TRUNC, TO_CHAR, etc.). For most tables, Snowflake recommends a maximum of 3 or 4 columns (or expressions) per key. Adding more than 3-4 columns tends to increase costs more than benefits. Based on these considerations, the best option for the clustering key columns is C. C1, C3, C2, because: These columns are heavily used in filter and join conditions of SELECT queries, which are the most effective types of predicates for clustering. These columns have high cardinality, which means they have many distinct values and can help reduce the clustering skew and improve the compression ratio. These columns are likely to be correlated with each other, which means they can help co-locate similar rows in the same micro-partitions and improve the scan efficiency. These columns do not require any functions or expressions to be applied to them, which means they can be directly used in the predicates without affecting the clustering.
ARA-C01 Exam Question 108
A retail company has over 3000 stores all using the same Point of Sale (POS) system. The company wants to deliver near real-time sales results to category managers. The stores operate in a variety of time zones and exhibit a dynamic range of transactions each minute, with some stores having higher sales volumes than others. Sales results are provided in a uniform fashion using data engineered fields that will be calculated in a complex data pipeline. Calculations include exceptions, aggregations, and scoring using external functions interfaced to scoring algorithms. The source data for aggregations has over 100M rows. Every minute, the POS sends all sales transactions files to a cloud storage location with a naming convention that includes store numbers and timestamps to identify the set of transactions contained in the files. The files are typically less than 10MB in size. How can the near real-time results be provided to the category managers? (Select TWO).
Correct Answer: B,C
To provide near real-time sales results to category managers, the Architect can use the following steps: Create an external stage that references the cloud storage location where the POS sends the sales transactions files. The external stage should use the file format and encryption settings that match the source files2 Create a Snowpipe that loads the files from the external stage into a target table in Snowflake. The Snowpipe should be configured with AUTO_INGEST = true, which means that it will automatically detect and ingest new files as they arrive in the external stage. The Snowpipe should also use a copy option to purge the files from the external stage after loading, to avoid duplicate ingestion3 Create a stream on the target table that captures the INSERTS made by the Snowpipe. The stream should include the metadata columns that provide information about the file name, path, size, and last modified time. The stream should also have a retention period that matches the real-time analytics needs4 Create a task that runs a query on the stream to process the near real-time data. The query should use the stream metadata to extract the store number and timestamps from the file name and path, and perform the calculations for exceptions, aggregations, and scoring using external functions. The query should also output the results to another table or view that can be accessed by the category managers. The task should be scheduled to run at a frequency that matches the real-time analytics needs, such as every minute or every 5 minutes. The other options are not optimal or feasible for providing near real-time results: All files should be concatenated before ingestion into Snowflake to avoid micro-ingestion. This option is not recommended because it would introduce additional latency and complexity in the data pipeline. Concatenating files would require an external process or service that monitors the cloud storage location and performs the file merging operation. This would delay the ingestion of new files into Snowflake and increase the risk of data loss or corruption. Moreover, concatenating files would not avoid micro-ingestion, as Snowpipe would still ingest each concatenated file as a separate load. An external scheduler should examine the contents of the cloud storage location and issue SnowSQL commands to process the data at a frequency that matches the real-time analytics needs. This option is not necessary because Snowpipe can automatically ingest new files from the external stage without requiring an external trigger or scheduler. Using an external scheduler would add more overhead and dependency to the data pipeline, and it would not guarantee near real-time ingestion, as it would depend on the polling interval and the availability of the external scheduler. The copy into command with a task scheduled to run every second should be used to achieve the near-real time requirement. This option is not feasible because tasks cannot be scheduled to run every second in Snowflake. The minimum interval for tasks is one minute, and even that is not guaranteed, as tasks are subject to scheduling delays and concurrency limits. Moreover, using the copy into command with a task would not leverage the benefits of Snowpipe, such as automatic file detection, load balancing, and micro-partition optimization. Reference: 1: SnowPro Advanced: Architect | Study Guide 2: Snowflake Documentation | Creating Stages 3: Snowflake Documentation | Loading Data Using Snowpipe 4: Snowflake Documentation | Using Streams and Tasks for ELT : Snowflake Documentation | Creating Tasks : Snowflake Documentation | Best Practices for Loading Data : Snowflake Documentation | Using the Snowpipe REST API : Snowflake Documentation | Scheduling Tasks : SnowPro Advanced: Architect | Study Guide : Creating Stages : Loading Data Using Snowpipe : Using Streams and Tasks for ELT : [Creating Tasks] : [Best Practices for Loading Data] : [Using the Snowpipe REST API] : [Scheduling Tasks]
ARA-C01 Exam Question 109
An Architect is designing a pipeline to stream event data into Snowflake using the Snowflake Kafka connector. The Architect's highest priority is to configure the connector to stream data in the MOST cost-effective manner. Which of the following is recommended for optimizing the cost associated with the Snowflake Kafka connector?
Correct Answer: A
The minimum value supported for the buffer.flush.time property is 1 (in seconds). For higher average data flow rates, we suggest that you decrease the default value for improved latency. If cost is a greater concern than latency, you could increase the buffer flush time. Be careful to flush the Kafka memory buffer before it becomes full to avoid out of memory exceptions. https://docs.snowflake.com/en/user-guide/data-load-snowpipe-streaming-kafka
ARA-C01 Exam Question 110
A company needs to have the following features available in its Snowflake account: 1. Support for Multi-Factor Authentication (MFA) 2. A minimum of 2 months of Time Travel availability 3. Database replication in between different regions 4. Native support for JDBC and ODBC 5. Customer-managed encryption keys using Tri-Secret Secure 6. Support for Payment Card Industry Data Security Standards (PCI DSS) In order to provide all the listed services, what is the MINIMUM Snowflake edition that should be selected during account creation?
Correct Answer: C
According to the Snowflake documentation1, the Business Critical edition offers the following features that are relevant to the question: Support for Multi-Factor Authentication (MFA): This is a standard feature available in all Snowflake editions Support for Multi-Factor Authentication (MFA): This is a standard feature available in all Snowflake editions1. A minimum of 2 months of Time Travel availability: This is an enterprise feature that allows users to access historical data for up to 90 days1. Database replication in between different regions: This is an enterprise feature that enables users to replicate databases across different regions or cloud platforms1. Native support for JDBC and ODBC: This is a standard feature available in all Snowflake editions1. Customer-managed encryption keys using Tri-Secret Secure: This is a business critical feature that provides enhanced security and data protection by allowing customers to manage their own encryption keys1. Support for Payment Card Industry Data Security Standards (PCI DSS): This is a business critical feature that ensures compliance with PCI DSS regulations for handling sensitive cardholder data1. Therefore, the minimum Snowflake edition that should be selected during account creation to provide all the listed services is the Business Critical edition. Reference: Snowflake Editions | Snowflake Documentation