WP What is Data Masking? | Techniques & Best Practices | Imperva

Data Masking

152.3k views
Data Security

What is Data Masking?

Data masking is a way to create a fake, but a realistic version of your organizational data. The goal is to protect sensitive data, while providing a functional alternative when real data is not needed—for example, in user training, sales demos, or software testing.

Data masking processes change the values of the data while using the same format. The goal is to create a version that cannot be deciphered or reverse engineered. There are several ways to alter the data, including character shuffling, word or character substitution, and encryption.

Blog: The Benefits of Including Static Data Masking in Your Security Arsenal.

How data masking works

Why is Data Masking Important?

Here are several reasons data masking is essential for many organizations:

  • Data masking solves several critical threats – data loss, data exfiltration, insider threats or account compromise, and insecure interfaces with third party systems.
  • Reduces data risks associated with cloud adoption.
  • Makes data useless to an attacker, while maintaining many of its inherent functional properties.
  • Allows sharing data with authorized users, such as testers and developers, without exposing production data.
  • Can be used for data sanitization – normal file deletion still leaves traces of data in storage media, while sanitization replaces the old values with masked ones.

Data Masking Types

There are several types of data masking types commonly used to secure sensitive data.

Static Data Masking

Static data masking processes can help you create a sanitized copy of the database. The process alters all sensitive data until a copy of the database can be safely shared. Typically, the process involves creating a backup copy of a database in production, loading it to a separate environment, eliminating any unnecessary data, and then masking data while it is in stasis. The masked copy can then be pushed to the target location.

Imperva partners with Thales CipherTrust Tokenization Services to deliver Data Masking capabilities to Imperva customers. While Imperva Data Security Fabric (DSF) provides real-time protection of live production data, CipherTrust Tokenization de-identifies data in non-production environments. It brings a static data masking capability that complements Imperva DSF, works across multiple data platforms, and supports flexible deployment mechanisms to integrate seamlessly into the existing enterprise IT framework without the need for any additional architectural changes.

Thales CipherTrust Tokenization Services offer multiple Data Masking options to fit any organizations need.

CipherTrust RESTful Data Protection (CRDP) is a Vaultless Tokenization solution that includes both Dynamic and Static Data Masking and centrally-manages your tokenization from the CipherTrust Manager GUI. CRDP enables data protection (tokenization or encryption) with a single line of code per field. CRDP can be scaled up to provide high availability and performance.

CRDP uses a REST API to protect sensitive data with format-preserving tokenization.

  • Dynamic Data Masking is a version of data redaction that applies a mask, based on the user or group, to hide a portion of sensitive data. Administrators establish policies, based on user or group, to dynamically mask parts of a field. For example, a security team could establish policies allowing a customer service representative to receive a credit card number with only the last four digits in the clear, while a customer service supervisor could be authorized to receive the full credit card number in the clear.
  • Static Data Masking is a version of data redaction that applies a static mask to a portion of the sensitive data to exclude it from being tokenized. For example, if most people accessing a database are customer service representatives, and they only need to see the last four digits of a credit card number to verify an account, static data masking offers a significant performance improvement over dynamic data masking by eliminating the need to detokenize the data for every single data access all day long.
  • Multi-tenancy is available through CipherTrust Manager.
  • Centrally managed protection policies and access policies enable the Data Security Admin to create and maintain policies to protect each type of data with the relevant cipher, parameters and key and restrict who can access the data in the clear.

CipherTrust Vaultless Tokenization (CT-VL) is a Vaultless Tokenization solution that includes Dynamic Data Masking and manages your tokenization with a REST API or the CT-VL GUI. CT-VL enables data protection (tokenization or encryption) with a single line of code per field.

CT-VL uses a REST API to protect sensitive data with format-preserving tokenization. CT-VL can be clustered to provide high availability and performance.

  • Dynamic Data Masking is a version of data redaction that modifies the mask based on the user or group. Administrators establish policies, based on user or group, to dynamically mask parts of a field. For example, a security team could establish policies allowing a customer service representative to receive a credit card number with only the last four digits in the clear, while a customer service supervisor could be authorized to receive the full credit card number in the clear.
  • Multi-tenancy is provided with CT-VL tokenization groups, which ensures that data tokenized by one group cannot be detokenized by another group. CT-VL centrally manages all tokenization groups.
  • Centralized Tokenization Templates allow you to describe how you want data protected within your CT-VL cluster.

Data Masking Best Practices

Determine the Project Scope

In order to effectively perform data masking, companies should know what information needs to be protected, who is authorized to see it, which applications use the data, and where it resides, both in production and non-production domains. While this may seem easy on paper, due to the complexity of operations and multiple lines of business, this process may require a substantial effort and must be planned as a separate stage of the project.

Ensure Referential Integrity

Referential integrity means that each “type” of information coming from a business application must be masked using the same algorithm.

In large organizations, a single data masking tool used across the entire enterprise isn’t feasible. Each line of business may be required to implement their own data masking due to budget/business requirements, different IT administration practices, or different security/regulatory requirements.

Ensure that different data masking tools and practices across the organization are synchronized, when dealing with the same type of data. This will prevent challenges later when data needs to be used across business lines.

Secure the Data Masking Algorithms

It is critical to consider how to protect the data making algorithms, as well as alternative data sets or dictionaries used to scramble the data. Because only authorized users should have access to the real data, these algorithms should be considered extremely sensitive. If someone learns which repeatable masking algorithms are being used, they can reverse engineer large blocks of sensitive information.

A data masking best practice, which is explicitly required by some regulations, is to ensure separation of duties. For example, IT security personnel determine what methods and algorithms will be used in general, but specific algorithm settings and data lists should be accessible only by the data owners in the relevant department.

Imperva Data Security Fabric

Organizations that leverage data masking to protect their sensitive data are in need of a holistic security solution. Even if data is masked, infrastructure and data sources like databases need to be protected from increasingly sophisticated attacks.

Imperva protects data stores to ensure compliance and preserve the agility and cost benefits you get from your data infrastructure investments.

Imperva Data Security Fabric (DSF) provides a unified agent and agentless architecture that enables organizations to gain observability and controls into all their data repositories – structured, semi-structured and unstructured, no matter where they are. Not only do the security teams benefit from this comprehensive fabric, but technology development teams, cloud architects, and the non-technical business people also gain confidence in the system and understand their organization’s security posture. They know if there is a security incident, they have the resources and technology to mitigate any threat. Find out more here.