Data masking changes the value of operational data for the intent of training or testing without compromising an organization’s sensitive data. It creates a version of data that cannot be deciphered or reverse engineered.
There are two common approaches to data masking:
- Static data masking (SDM) permanently replaces sensitive data by altering data at rest.
- Dynamic data masking (DDM) aims to replace sensitive data in transit leaving the original at-rest data intact and unaltered.
What is Static Data Masking
Although the name may imply sluggishness, SDM is an established technology capable of protecting a large swath of the data within your organization.
SDM is primarily used to provide high quality (i.e., realistic) data for development and testing of applications without disclosing sensitive information. Realism is important because it allows development and testing teams to be more effective at identifying defects earlier in the development cycle. This helps reduce costs and improves the outcome a product.
Static data masking is also used to:
- Protect data for use in analytics and training
- Facilitating compliance with standards and regulations (such as GDPR, PCI, HIPAA) that require limits on the use of data that identifies individuals.
- Enable more secure cloud adoption practices. DevOps workloads are among the first that organizations migrate to the cloud. Masking data on-premises prior to uploading it to the cloud reduces risk for organizations concerned with cloud-based data disclosure.
How Static Data Masking is Applied
There is no substitute for the subtleties and nuances of data that has evolved and grown through normal application usage. The alternative is synthetic data generation, which is uniform and lacking in realism because it hasn’t been created through years of usage.
SDM starts with the original production data and applies a series of data transformations to produce high-fidelity masked data.
A copy of production data is used to create a golden masked copy of the database, which is then replicated to the various environments. Often overlooked is the fact that for many organizations, there is no (or minimal) masking applied before replication to less secure environments.
SDM changes sensitive data in a realistic manner. For example, names and birth dates can be altered to protect sensitive data, but it still looks realistic enough to facilitate effective testing.
What are the Advantages of Static Data Masking
- Sensitive data is permanently removed because the data transformations are applied to the data store. If an attacker compromises a statically masked database, the sensitive data simply isn’t there.
- No per transaction performance penalty. All data transformations are applied up front so that there is no performance impact once the masked database is made available to the various functions.
- Protects copies of production data in a wide range of scenarios including access via applications and back-end native queries.
- Greatly simplifies security of copy data. There’s no need to implement fine-grained object-level security because all sensitive data has been replaced.
What are the Disadvantages of Static Data Masking
- Masking is applied to a data store via a batch process (not real time) that may take minutes or hours to complete depending on the size of the data.
- It cannot be used to protect the production database because it permanently alters the underlying data. As described above, it operates against copies of production databases.
What is Dynamic Data Masking
As the name implies, DDM changes the value of data dynamically, while in-transit.
This method is primarily used to apply role-based (object-level) security for databases and applications. The complexities involved in preventing masked data from being written back to the database means that DDM should be applied in read-only contexts, such as reporting or customer service inquiry functions. DDM is sometimes viewed as a means to apply role-based security to (legacy) applications that don’t have a built-in, role-based security model or to enforce separation of duties regarding access. However, there are limitations to this usage.
How is Dynamic Data Masking Applied?
There are a number of different approaches to implement DDM including database and web proxies. Some database vendors now offer DDM directly within the database engine. The database proxy approach usually works by modifying SQL queries, but can also modify query result sets.
The sensitive data remains within the reporting database that is queried by an analyst. All SQL issued by the analyst passes through the DB proxy which inspects each packet to determine which user is attempting to access which database objects. The SQL is then modified by the proxy before it is issued to the database; therefore, masked data is returned via the proxy to the analyst. In other words, a query like the one illustrated below retrieves SSNs from the database:
This request is modified to be something like the query below, which instead of returning a list of SSNs, returns the last four digits of SSN with the leading six digits redacted with X’s:
Visually, these queries would produce something similar to the following, keeping in mind that the SSNs stored in the database are not changed:
SSN Unmasked | SSN Masked |
147-22-3099 | XXX-XX-3099 |
What are the Advantages of Dynamic Data Masking
- Adds an additional layer of security and privacy control to protect sensitive data.
- Protects data in read-only (reporting) scenarios.
- Works in near real-time.
- Does not require up front batch processing to mask all data in advance.
What are the Disadvantages of Dynamic Data Masking
- Not well suited for use in a dynamic (read/write) environment such as an enterprise application because masked data could be written back to the database, corrupting the data.
- Performance overhead associated with inspecting all traffic destined for the database.
- Detailed mapping of applications, users, database objects and access rights are required to configure masking rules. Maintaining this matrix of configuration data requires significant effort.
- The proxy is a single point of failure and can be bypassed by users connecting directly to the database potentially exposing the original data stored in the database.
- Organizations may be hesitant to adopt DDM if there is a risk of corruption or adverse production performance impacts. In addition, relative to SDM, DDM is a less mature technology for which customer success stories are not as well known and use cases are still being defined.
Conclusion
Data masking is a required data protection technology for any organization that stores sensitive data. Static data masking has evolved from stand-alone, single database point solutions to become integrated components in broader data management and data security offerings. It is one of the best ways, if not the best way, to protect copy data particularly when that data is used for secondary purposes such as application development and testing. Alternatives such as synthetic data generation exist, but cover a much narrower set of use cases such as when original data sources are not robust or readily available.
Dynamic data masking is often suited for read-only scenarios, to avoid corrupting databases by inadvertently writing masked data back to data stores. Additionally, DDM may be perceived as an easy way to apply role-based security for applications but the read/write restriction coupled with the rule configuration complexity makes ongoing rule management a burdensome task. Alternatives to DDM may include database/application firewalls or blocking that prevent unwanted access to sensitive data using methods other than SQL rewriting.