Why Discover and Classify is so critical
Ice hockey goal-scoring great Wayne Gretzky is reported to have said, “You miss 100 percent of the shots you don’t take.” The data security version of this quip is “you protect zero percent of the data you can’t see” and the data privacy version is “if you don’t know what bits of your data repositories are sensitive and private, you can’t say you are 100 percent compliant with data privacy rules.” To secure private data, you first have to know which data is private so you can monitor how people with credentialed access interact with it. The first steps in this process are to discover what data is private and classify it so you know what policies to attach to it.
In this post, we’ll define the types of data that organizations need to discover and classify and explain what the Imperva Data Security Fabric (DSF) Discover and Classify solution does to make it easier. We’ll articulate the principal use cases, go over why most security teams are not doing this effectively now and explain how easy it is to get started.
The three types of data that must be discovered and classified
In today’s data landscape, in order to build the foundation necessary to master data privacy, you must have the capability to discover and classify these data types:
- Structured data. These data are characterized by predefined data models. They are text-based, easy to search, and consist of dates, phone numbers, Social Security numbers, names and transaction histories, etc. These data are generally stored in rows and columns and live in relational databases, data warehouses, and the like.
- Unstructured data. In these cases there is no defined data model. Text is difficult to search and includes PDF, image, and video files. The data live in various forms in applications, data warehouses, and data lakes. Examples include emails, messages, and conversation transcripts, to name a few.
- Semi structured data. This is loosely organized in meta-level structure containing unstructured data in HTML, XML, and JSON formats. These data live in relational databases, tagged-text formats, abstracts, and figures. Examples of semi structured data are server logs, tweets organized by hashtags, and emails sorted by folders.
Recommendations for how to get started with discover and classify
First, establish where you are in the process by asking the following five questions:
- Do you know where your sensitive unstructured data is, how much you have, and how much of a risk it is to your organization?
- Do you keep track of who has access to the data?
- How do you classify and tag data for compliance enterprise-wide without automation?
- How often do you classify, validate and remediate findings?
- How do you understand your data risk and protect the privacy of sensitive data?
Focusing on data level capabilities for efficiencies and simplification to gain control of your sensitive data – access, usage and sharing – will make it easier to address these issues.
The discover and classify data workflow
Regardless of the type of the data, the workflow for discovery and classification can be broken down into three main phases.
- Use regex, lists, algorithms, and machine learning to find sensitive data.
- Show a clear representation of the data that has been found, so it is visible to all.
- Use the represented data to adhere to data privacy policy and regulations to minimize chargebacks, fines, and lost customers.
In the end, you need to have all three data types completely visible and all in one place. This is particularly important for unstructured data, which is growing 55-65 percent per year, according to Forbes. By connecting unstructured data sources you can gain a credible inventory of all unstructured data, discover hidden data that could put your organization at risk, and validate and enforce file entitlements.
When you need discover and classify functionality
There are dozens of scenarios and use cases where organizations would benefit from a reliable discover and classify process, and here are a few:
- Cloud migration. If you have a discover and classify process for your on-premises data types, your solution will also need to work with your data when it’s hosted in the cloud.
- Mergers and acquisitions. When your organization acquires data types from another entity, you will need to apply a discover and classify process when you merge it with your existing data repositories.
- Records and retention. In a forensic audit scenario, you need to apply a discover and classify process to data that you may have had in your repositories for months or even years.
- Regulatory compliance. A discover and classify process for your data repositories will make audit reporting to demonstrate compliance much more straightforward.
- DSARs. These types of requests are costly, yet regulatory trends are pointing to the probability of organizations needing the functionality to fulfill them easily and at scale to remain compliant.
- Redaction of sensitive information. Efficient discovery and classification put an organization in a position to easily remove sensitive personal data from its data repositories in a matter of minutes.
The factors stopping security professionals from doing discover and classify
There often exists a “technology inertia” among security teams that compels them to perform critical functions with the same manual processes they’ve always used. In other cases, it seems too difficult or expensive. Some teams are content to use a mishmash of tricks, tools, and hacks to project a veneer of a discover and classify process. Still, others believe it won’t work in their environment. The common thread running through all these excuses is expressed in one of the aforementioned quotes; “if you don’t know what bits of your data repositories are sensitive and private, you can’t say you are 100 percent compliant with data privacy rules.”
What Imperva DSF Discover and Classify does
Imperva DSF Discover and Classify supports the data sources and file types you need to care about to complete a discover and classify project, and it deploys in hours – not weeks or months.
You’ll see all your structured, unstructured, and semi-structured data – hundreds of file types – in a fast, consistent, enterprise-wide classification framework through a single pane of glass, making Discover and Classify simple. It covers cloud data sources like Google Workspace and Office 365, on-premises file servers, and provides exhaustive metadata about all your data. The result is complete control over every shred of sensitive data in your entire data repository, making you 100% capable of applying security controls to cloud-native data, producing current and forensic audit reports on sensitive data to demonstrate compliance in minutes, and quickly responding to subject requests and sensitive data redaction requests.
Learn more
To find out how to get started, speak with an Imperva Solutions Representative.
Try Imperva for Free
Protect your business for 30 days on Imperva.