Which of the following is the BEST way to hide sensitive personal data that is in use in a data lake?
Correct Answer: A
Reference:
Data masking is a technique that replaces sensitive or confidential data with realistic but fictitious data, such as random characters or numbers, to prevent unauthorized access or disclosure of the original data. Data masking is the best way to hide sensitive personal data that is in use in a data lake, as it would protect the privacy of the data subjects by reducing the linkability of the data set with their original identity, and also comply with the data minimization principle that requires limiting the collection, storage and processing of personal data to what is necessary and relevant for the intended purposes. Data masking would also preserve some characteristics or patterns of the original data that can be used for analysis or research purposes, without compromising the accuracy or quality of the results. The other options are not as effective as data masking in hiding sensitive personal data that is in use in a data lake. Data truncation is a technique that removes some portions of data from a document or file, such as digits from a credit card number or characters from an email address, to prevent unauthorized access or disclosure of the original data, but it may affect the accuracy or quality of the analysis or research results, as some characteristics or patterns of the original data may be lost or distorted. Data encryption is a technique that transforms plain text data into cipher text using an algorithm and a key, making it unreadable by unauthorized parties, but it does not reduce the linkability of the data set with the original identity of the data subjects and may require additional security measures to protect the encryption keys or certificates. Data minimization is a principle that requires limiting the collection, storage and processing of personal data to what is necessary and relevant for the intended purposes, but it does not address how to hide sensitive personal data that is already in use in a data lake1, p. 74-75 Reference: 1: CDPSE Review Manual (Digital Version)