Data lakes serve as a central repository for storing several data types – structured, semi-structured, and unstructured – at scale. One of the ways data lakes are useful is they do not require any upfront work on the data. You can simply integrate and store data as it streams in from multiple sources.
Amazon’s AWS data lakes are some of the most popular cloud data solutions available on the market today. AWS data lakes are purpose-built to deliver secure cloud architectures to customers. AWS helps relieve its customers’ operational burden by operating, managing, and controlling the components from the host operating system and virtualization layer down to the physical security of the facilities in which the service operates. It is the customer’s responsibility, however, to secure their sensitive data. You can see how this works in the shared responsibility model AWS follows.
Risks to sensitive data start to pick up momentum when organizations move workloads to the cloud quickly and lose track of where their sensitive data resides. To maintain security in these environments, you need a good data catalog, know where data copies are, where snapshots may be, etc. You must also have enforceable access control policies in place around sensitive data. You must have audit trails, the ability to run data through forensics if needed, the ability to validate what entitlements are and reduce them, and the capacity to check for vulnerabilities from a surface area perspective. These aren’t new practices; they have been integral to how organizations have applied data-centric security strategies to data repositories for years. What’s new is the need to apply these practices to cloud-managed environments like AWS data lakes.
Imperva Data Security Fabric (DSF) enables enterprises to protect their sensitive data in AWS enterprise data lakes and help demonstrate data compliance. The Imperva DSF solution enables AWS customers to see and secure their sensitive data through a single comprehensive platform and leverage a unified security model across Amazon Aurora, Amazon Redshift, Amazon Relational Database Service (RDS), Amazon DynamoDB, Amazon Athena, and AWS CloudFormation without requiring any changes to their existing data infrastructure.
Many security teams have gaps in resources and domain expertise that make it costly and difficult to apply company security policies to sensitive cloud-managed data and ensure their data lake meets organizational compliance. These gaps are causing organizations to miss when a compromised user accesses sensitive data and prevent malicious insiders from stealing data. For many organizations, this crisis creates difficult decisions: limit the data they store in a data lake, or run a greater risk of non-compliance, and in the worst-case scenario, experience a data breach.
Imperva Data Security Fabric addresses these challenges by first discovering data lakes defined and cataloged using services like AWS Lake Formation and AWS Glue. It identifies sensitive data stored across services like Amazon S3, Amazon Redshift, and Amazon RDS by leveraging its internal data classification engine, or by importing classification scans from Amazon Macie, to identify where sensitive data is stored. Imperva DSF collects data access logs from services like Amazon CloudWatch to audit when a user is accessing raw data files stored in Amazon S3 or executing analytic queries against the data using services like Amazon Athena, or Amazon EMR.
Imperva DSF includes User Entity Behavior Analytics (UEBA) models that can identify suspicious data access patterns, such as excessive access to sensitive records, the use of privileged service accounts by interactive users, and suspicious network connections. This helps organizations automatically identify and detect potential data breaches without the need for specialized data security analysts. Finally, with Imperva DSF, security operations teams can create playbooks to automatically mitigate threats using native AWS features like security groups or revoke user access using AWS IAM. This ensures organizations stay in compliance while also helping to prevent data breaches.
Comprehensive data security in AWS architectures
From a single dashboard, Imperva DSF enables teams to see a broad range of data security capabilities, including data discovery, classification, monitoring, access control, risk analytics, compliance management, security automation, threat detection, and audit reporting.
Seeing all this in one place makes it easier to monitor data migration and to protect sensitive data such as Personally Identifiable Information (PII) like customer names, email addresses, phone numbers, and gender. The single dashboard also enables teams to see they are adhering to privacy regulations, such as the General Data Protection Regulation (GDPR), Payment Card Industry Data Security Standard (PCI-DSS), and the Health Insurance Portability and Accountability Act (HIPAA).
Tens of thousands of organizations build data lakes on AWS and configure AWS Lake Formation, AWS Identity and Access Management (IAM), and Amazon Simple Storage Service (Amazon S3) policies to secure access to them. Imperva DSF leverages services like AWS Lake Formation and AWS Glue to discover these data lakes and gain visibility into them. This enables security teams to monitor how users query and access stored data, and detect and prevent malicious user access and data leakage incidents. Imperva DSF also safeguards critical data workloads across all of their databases, file repositories, data warehouses, multi-cloud, and data lake environments.
Enterprises can deploy Imperva Data Security Fabric directly in any AWS Regions using pre-built AWS CloudFormation templates. Once deployed, Imperva DSF will begin discovering and monitoring data lakes. More than 400 pre-defined vulnerability assessment tests are available to run on cloud databases on AWS. Also, Imperva DSF takes the complexity out of deciding which baselines to establish by including policies based on Center for Internet Security (CIS) and Defense Information System Agency’s (DISA) Security Technical Implementation Guide (STIG) benchmarks that are adapted for the cloud.
About Imperva DSF and AWS
The support of data lakes is the latest milestone in Imperva’s work with AWS. Imperva is an AWS Partner with the AWS Security Independent Software Vendor (ISV) Competency and Amazon RDS Ready Product validation. Imperva also participates in AWS Marketplace and AWS ISV Accelerate Program.
Imperva Data Security Fabric natively supports over 65 data stores to simplify security and compliance of data in any data store, no matter where it is hosted. Specifically for AWS, Imperva DSF supports:
- Relational Databases: Amazon Aurora PostgreSQL, Amazon Aurora MySQL, Amazon RDS for MariaDB, Amazon RDS for SQL Server, Amazon RDS for MySQL, Amazon RDS for Oracle, Amazon RDS for PostgreSQL
- Non-Relational Databases & Data Services: Amazon DocumentDB, Amazon Keyspaces (for Apache Cassandra), Amazon DynamoDB, Amazon EMR
- Data Warehouses & Services: Amazon Redshift, Amazon Athena, AWS Glue
- Storage Service: Amazon S3
- Data Lake: AWS Lake Formation
If you are currently looking for a solution to gain visibility and apply your security control policies to protect sensitive data in AWS cloud-managed data lakes, contact Imperva today.
Try Imperva for Free
Protect your business for 30 days on Imperva.