What Is Regular Expression Denial of Service (ReDoS)?
ReDoS is a form of DoS attack exploiting flaws in regular expressions (regex), which are patterns used to match character combinations in strings. Regex is widely used in programming for string parsing, validation, and manipulation. However, its power and flexibility come with a downside—poorly written regex can be exploited to cause a significant slowdown in the system. When exploited by attackers, this can result in a ReDoS attack.
A ReDoS attack can engage the regex engine in extremely long, or even infinite, computations. By crafting input that matches a vulnerable regex pattern, an attacker can consume the computational resources of the system, leading to a denial of service. To make things worse, even a single request with the right payload can trigger a ReDoS, unless other types of denial of service which require a large volume of requests.
This is part of a series of articles about DDoS protection
What Is the Impact of Regular Expression Denial of Service (ReDos)?
ReDoS attacks can cripple your system’s ability to respond to legitimate requests. This happens because the regex engine gets stuck in a state of perpetual computation, failing to free up the resources for other processes.
Depending on the location of the RegEx engine in the execution flow, a successful ReDoS attack can significantly slow down a system, or completely block legitimate requests, impacting availability. In either case, mission critical applications can experience financial losses, loss of trust from customers, and reputational damage.
The impact of ReDoS is not just limited to the system under attack. Attacks can affect web browsers, security systems like web application firewalls (WAF), and back-end databases, all of which sometimes process regular expressions.
How a Regex Engine Works
To understand how ReDoS exploits regex, it’s essential to understand how a regex engine works. A regex engine is a tool that interprets and executes regular expressions. It searches for a pattern within a string and provides a result if a match is found. There are two types of regex engines – deterministic and non-deterministic.
Deterministic Regex Engines
Deterministic regex engines follow a single path through the state machine, while non-deterministic engines explore all possible paths. While deterministic engines are immune to ReDoS, non-deterministic engines can be exploited.
For example, consider a deterministic regex engine processing the regular expression ^abc$ which aims to match a string that exactly contains “abc”. The engine follows these steps:
- Initialization: The engine starts in the initial state, ready to process the first character of the input string.
- Character matching: It reads the first character of the input. If the character is ‘a’, it moves to the next state. Otherwise, it rejects the input.
- Sequential progression: Upon successfully matching ‘a’, it expects ‘b’ as the next character. If ‘b’ is found, it moves forward to expect ‘c’.
- Final matching: After matching ‘b’, it looks for ‘c’. If ‘c’ is the next character, and no more characters follow, the match is successful.
If at any point the expected character does not match the current input character, the deterministic engine rejects the input without needing to backtrack or explore alternative paths.
Non-Deterministic Regex Engines
In a non-deterministic regex engine, if a string doesn’t match the pattern, the engine backtracks and tries a different path. This backtracking can lead to an astronomical number of permutations, especially if the regex contains nested quantifiers. This is exactly what a ReDoS attack exploits. It feeds the regex engine with a string that forces it to backtrack excessively, causing a significant drain on computational resources.
For example, consider the regex pattern (a|ab)+b$, designed to match strings ending in ‘b’ that are preceded by sequences of ‘a’ or ‘ab’. For an input like “aab”, a non-deterministic engine might process it as follows:
- Multiple path exploration: Upon encountering ‘a’, the engine acknowledges two paths: one where ‘a’ is part of the (a|ab) group and another where it might be the beginning of an ‘ab’ sequence.
- Initial match and continuation: The engine matches the first ‘a’ and then sees another ‘a’. It can continue matching ‘a’s or consider the second ‘a’ as the start of an ‘ab’ sequence.
- Backtracking: After matching the second ‘a’, it reaches ‘b’. If it had been pursuing the path expecting another ‘a’ (for an ‘ab’ sequence), it must backtrack, realizing ‘b’ can only serve as the final ‘b’ in the pattern.
- Successful match: The engine successfully matches the input by treating the first two ‘a’s as part of sequential ‘a’s in the (a|ab) group, followed by the final ‘b’.
This process illustrates how backtracking can lead to inefficiency, as the engine might need to explore several paths before finding a match or determining no match is possible. In the context of a ReDoS attack, malicious input can be crafted to maximize backtracking, severely degrading performance.
ReDoS Vulnerability Examples
Evil Regex
Evil Regex is a term coined for a regex pattern that is susceptible to ReDoS attacks. These are typically regex patterns with nested quantifiers, which can lead to backtracking.
For example, consider the regex pattern ^(a+)+$. This pattern aims to match strings composed entirely of the letter ‘a’. However, if fed with a string like ‘a’*30 + ‘!’, the regex engine will start backtracking to find a match, which it never will, causing a ReDoS.
Here are additional examples of Regex shared by the Open Web Application Security Project (OWASP), all of which are vulnerable to inputs like “aaaaaaaaaa!”:
- (a+)+
- ([a-zA-Z]+)*
- (a|aa)+
- (a|a?)+
ReDoS via Regex Injection
Regex injection is another method used by attackers to induce a ReDoS. In this technique, an attacker injects malicious regex patterns into a system, typically via input fields that are processed by the regex engine.
For instance, if a web application uses regex to validate user input, an attacker can inject an evil regex pattern into the input field. If the system processes this pattern without proper validation, it could lead to a ReDoS.
Vulnerable Regex in Online Repositories
Online code repositories are a treasure trove for hackers looking for ReDoS vulnerabilities. Many open-source projects use regex for various purposes, and not all of them follow best practices when writing regex patterns.
A recent study of thousands of popular projects on GitHub found that over 10% were vulnerable to ReDoS attacks. Attackers can also conduct the same research, finding repos that are vulnerable to ReDoS and attacking systems that use those libraries.
Preventing and Mitigating ReDoS Attacks
Reduce the Number of Combinations
One of the first steps in preventing and mitigating ReDoS attacks is to reduce the number of combinations in your regular expressions.
When a regular expression has a large number of combinations to explore, it makes it easier for a ReDoS attack to succeed. The reason for this is that ReDoS attacks exploit the exponential time complexity of certain regular expressions, causing the system to consume an enormous amount of resources and eventually crash.
By reducing the number of combinations in your regular expressions, you limit the avenues through which a ReDoS attack can exploit. You can do this by making your expressions as specific as possible. Avoid using wildcard characters unless absolutely necessary, and aim to match only the exact patterns that you need.
Control Backtracking
Another effective strategy for preventing and mitigating ReDoS attacks is to control backtracking in your regular expressions.
To control backtracking, you can use what’s known as “possessive quantifiers” or “atomic groups”. These are special types of quantifiers that do not backtrack once they have matched a part of the input string. By using these types of quantifiers in your regular expressions, you can prevent a ReDoS attack from causing excessive backtracking.
Load Balancing and Redundancy
By distributing the load and having backup resources, you can ensure that your system remains up and running even in the face of a ReDoS attack. In a load-balanced system, incoming network traffic is distributed across multiple servers or resources. If one server gets overwhelmed by a ReDoS attack, the other servers can continue to handle the incoming traffic.
Redundancy involves having backup resources that can take over if the primary resources fail. In the context of ReDoS attacks, redundancy can mean having backup servers or regular expression engines that can take over if the primary ones are overwhelmed by an attack.
Regex Testing
Regular expression testing involves scrutinizing your regular expressions to ensure that they are not vulnerable to ReDoS attacks. This can be done manually, by examining the expressions and their potential combinations. However, manual testing can be difficult and time-consuming, especially for complex expressions.
Regular expression tools can automate the testing process, checking your regular expressions against a variety of inputs and scenarios. Some tools can generate inputs that are specifically designed to trigger ReDoS vulnerabilities, allowing you to identify and fix potential issues before they can be exploited. A few common regular expression tools with testing capabilities include RegexBuddy, RegExr, and Regular Expressions 101.
ReDoS Protection with Imperva
Imperva’s Web Application Firewall can prevent attacks with world-class analysis of web traffic to your applications, including ReDoS attacks. Beyond that, Imperva provides comprehensive protection for applications, APIs, and microservices:
Runtime Application Self-Protection (RASP) – Real-time attack detection and prevention from your application runtime environment goes wherever your applications go. Stop external attacks and injections and reduce your vulnerability backlog.
API Security – Automated API protection ensures your API endpoints are protected as they are published, shielding your applications from exploitation.
Advanced Bot Protection – Prevent business logic attacks from all access points – websites, mobile apps and APIs. Gain seamless visibility and control over bot traffic to stop online fraud through account takeover or competitive price scraping.
DDoS Protection – Block attack traffic at the edge to ensure business continuity with guaranteed uptime and no performance impact. Secure your on premises or cloud-based assets – whether you’re hosted in AWS, Microsoft Azure, or Google Public Cloud.
Attack Analytics – Ensures complete visibility with machine learning and domain expertise across the application security stack to reveal patterns in the noise and detect application attacks, enabling you to isolate and prevent attack campaigns.
Client-Side Protection – Gain visibility and control over third-party JavaScript code to reduce the risk of supply chain fraud, prevent data breaches, and client-side attacks.