What is Data Sanitization?
Data sanitization involves purposely, permanently deleting, or destroying data from a storage device, to ensure it cannot be recovered.
Ordinarily, when data is deleted from storage media, the media is not really erased and can be recovered by an attacker who gains access to the device. This raises serious concerns for security and data privacy. With sanitization, storage media is cleansed so there is no leftover data on the device, and no data can be recovered, even with advanced forensic tools.
The Need for Data Sanitization
As the useful lifetime and storage capacity of storage equipment continues to increase, IT assets often retain sensitive business data after they are decommissioned. These assets might include:
- Disk drives on desktop and laptop computers
- Flash media
- Mobile devices
- Dedicated storage equipment
When a company’s IT assets reach the end of their useful life, they must be sanitized to ensure sensitive data stored on the equipment is really erased, before disposing or reusing it.
The most common scenario for data sanitization is re-imaging. This usually happens when equipment is reassigned to new users. Imaging overwrites the core operating system files, file allocation table (FAT), etc. However, the old data is not actually deleted.
Instead, the operating system deletes instances of files that the user can view and manipulate, and then marks the files for deletion. If the operating system needs more space, files are overwritten, and only then is the old data actually removed from the asset. Given today’s large storage capacity, gigabytes of data may remain on an unsanitized device.
Equipment must be sanitized even if it is being disposed of. When a company sells, donates, or disposes of equipment that contains storage, IT, and security teams must make sure they have a reliable data sanitization strategy. Otherwise, they are not only giving away the device but together with it, they may be exposing sensitive company data.
Data Sanitization Methods
There are four primary methods to achieve data sanitization: physical destruction, data erasure, cryptographic erasure, and data masking.
Physical Destruction
The most obvious way to sanitize a device is to physically destroy the storage media or the device it is a part of—for example, destroying a hard disk or an old laptop with an embedded hard disk.
There are two primary ways of destroying storage media:
- Using industrial shredders to break the device into pieces.
- Using degaussers, which expose the device to a strong magnetic field, which irreversibly erases data on hard disk drives (HDD) and most kinds of tapes.
However, the downside of these techniques is that they damage the storage media and do not allow it to be sold or reused. They are complex and expensive to carry out and are also harmful to the environment.
Data Erasure
This technique uses software to write random 0s and 1s on every sector of the storage equipment, ensuring no previous data is retained.
This is a very reliable form of sanitization because it validates that 100% of the data was replaced, at the byte level. It is also possible to generate auditable reports that prove data has been successfully sanitized. The advantage of this method compared to physical destruction is that it does not destroy the device and allows it to be sold or reused.
However, the downside of data erasure is that it is a time-consuming process, is difficult to carry out during the lifetime of the device, and requires that each decommissioned device goes through a strict sanitization process.
Cryptographic Erasure
This method uses public-key cryptography, with a strong key of at least 128 bits, to encrypt all the data on the device. Without the key, the data cannot be decrypted and becomes unrecoverable. Finally, the private key is discarded, effectively erasing all data on the device.
Encryption is a fast and effective way to sanitize storage devices. It is best suited for removable or mobile storage devices, or those that contain highly sensitive information.
The challenges of cryptographic erasure are that it relies on encryption features that come with the storage equipment, which may not be suited to the task. The technique can also fail through user errors, key management issues, or malicious actors who can intervene in the process and obtain the key before it is disposed of.
Most importantly, cryptographic erasure typically does not meet regulatory standards for data sanitization, because effectively, the data remains on the device.
Data Masking
Data masking is a widely used technique in compliance strategies and is explicitly required by some compliance standards. Masking involves creating fake versions of the data, which retain structural properties of the original data (for example, replacing real customer names with other, randomly-selected names).
Masking techniques include character shuffling, word replacement, and randomization. What is common to all these techniques is that the masked version of the data cannot be reverse engineered to obtain the original data values.
Data masking is highly effective for sanitization. Effectively, it sanitizes data on the device while it is still in use. The key advantages of sanitization compared to other techniques are:
- Quick and easy to implement.
- Complies with most regulations and standards.
- Can be applied on an ongoing basis to existing data.
- Does not require a special sanitization policy for decommissioned devices—unless they contain unmasked data.
Data Discovery and Sanitization
Data discovery involves identifying what data exists in an organization, across multiple data sources, and providing a holistic view of an organization’s data assets. Data discovery involves three primary activities:
- Identifying data sources and joining the data to create a bigger picture.
- Generating interactive visualizations that make it possible to explore the data.
- Enabling “mash-ups” of data from different sources to create new, useful datasets.
Sanitization relies on data discovery because an organization cannot sanitize sensitive data without knowing it exists. A data discovery initiative can help the organization find old or unused datasets or storage devices, identify them as candidates for sanitization, and use this knowledge to create an action plan for sanitization.
Data Masking and Discovery with Imperva
Imperva is a security platform that provides data masking and encryption capabilities. It lets you obfuscate sensitive data so it would be useless to the bad actor, even if found on a device that is lost, sold, or disposed of.
Imperva also provides data discovery and classification, helping your organization reveal the location, volume, and context of data on-premises and in the cloud, making it easier to discover sensitive data and old devices that need to be sanitized.
In addition to data masking and discovery, Imperva’s data security solution protects your data wherever it lives—on-premises, in the cloud, and in hybrid environments. It also provides security and IT teams with full visibility into how the data is being accessed, used, and moved around the organization.
Our comprehensive approach relies on multiple layers of protection, including:
- Database firewall—blocks SQL injection and other threats, while evaluating for known vulnerabilities.
- User rights management—monitors data access and activities of privileged users to identify excessive, inappropriate, and unused privileges.
- Data loss prevention (DLP)—inspects data in motion, at rest on servers, in cloud storage, or on endpoint devices.
- User behavior analytics—establishes baselines of data access behavior, uses machine learning to detect and alert on abnormal and potentially risky activity.
- Database activity monitoring—monitors relational databases, data warehouses, big data, and mainframes to generate real-time alerts on policy violations.
- Alert prioritization—Imperva uses AI and machine learning technology to look across the stream of security events and prioritize the ones that matter most.