What is “dark data”?
The term “dark data” refers to “any information assets that organizations collect, process, and store during regular business activities but generally fail to use for other purposes” [Gartner].
Often retained for compliance reasons, this data can also include past employee records, financial information and transaction logs, confidential survey data, emails, internal presentations, download attachments, and even surveillance video footage. It refers to any forgotten data left behind by general processes that might be unutilized, unknown, and unused – invariably as a result of a user’s daily digital interactions. This data can be anywhere. Spread across all areas of an organization and a myriad of data repositories, from data lakes to applications.
By its nature, accurate volumes of an organization’s dark data are challenging to estimate. As organizations produce data at a volume regularly exceeding that which can be analyzed, it is common for over half an organization’s data to not be available for analysis [Splunk]. The volume of unstructured data – data not organized by any pre-defined data model is rising at a rate of 55-65% per annum [Forbes]. Every minute of every day, 1.7 MB of data is created for each of the 7.3 billion people on our planet. This means that by 2025 it is estimated there will be 175 trillion gigabytes (175 zettabytes) of data globally, 80% of which will be unstructured, and 90% of that unstructured data will never be analyzed or used in regular business activities – despite compulsory regional data standards, it’s business value, and cost of storage [IDC].
Shining a light onto dark data
To protect dark data from bad actors and make it available to business auditors, an organization needs to find it and discover what is sensitive and what is exposed. Discovering and classifying dark data enables an organization to leverage this previously unknown information for decision-making. To accomplish this, security teams need to know where sensitive dark data resides, who accesses it, and when abuse occurs in order to take immediate action.
There are two main approaches to assessing and revising an organization’s dark data. There are independent consulting specialists who can review a data environment and conduct in-depth reviews of unused and uncatalogued data on an organization’s behalf. Organizations can also, with the right tools, automatically review all their data repositories themselves, wherever their data resides. This is often preferable as it further enables organizations to identify regulatory violations, identify internal permissions (who can see what), discover other gaps in organizational data security, and identify the potentially malicious or negligent behavior that could place confidential and private data in jeopardy. If an organization chooses to use a data analytics solution instead of an external contractor, they will invariably get a more comprehensive, discerning, and precise understanding of their data with clearer actions on how to proceed to remediate any risk.
It is not until an organization has visibility into its dark data that it can discover its business value and protect that data accordingly. Building a basic framework in order to ‘tag’ or catalog this hidden data is the first step to gaining that insight. Without this, an organization can’t comply with data governance standards, regional regulatory compliance, offer truly effective security, or guarantee data privacy for its customers and employees.
Organizations need to know if their data is already visible and being used – is it managed data, business-critical obsolete redundant, or dark data? It is critical to know where data is, what it is, and what standards and policies must apply to it. Knowing who is accessing it and how organizational data is (and should be) governed are all a part of the basic framework for classification and discovery. After proper investigation, truly obsolete dark data can be scheduled for deletion, which reduces the required capacity of data storage and associated costs.
Dark data discovery and classification tools
Out of the box, Imperva Data Security Fabric Data Discover and Classify enables an organization to search through its data automatically, wherever it is stored, to find and classify dark and unstructured data. On an enterprise-wide scale, it lets organizations find hidden, exposed, and sensitive material. It shows the location, volume, the context, and lets it protect that data accordingly with clearly defined recommendations for action.
If you would like to know more about locking down your dark data governance, compliance, and security, please speak to us in our online chat or contact us here. We’re always happy to talk, with no obligation, and maybe we can help your organization shine some light into its unstructured and hidden data reserves.
Try Imperva for Free
Protect your business for 30 days on Imperva.