Web scraping is the process of using bots to extract information from a website. In recent years, the debate over web scraping is growing more complex as business intelligence and data privacy issues arise.
The practice of web scraping has gone on for nearly as long as there have been websites. To be fair, there is “good” web scraping that, in fact, is a fundamental underpinning of the internet. Here are a few examples of the practice of “good” web scraping:
- “Good” search engine bots crawl websites to index, analyze, and rank their content
- Price comparison sites deploy bots to auto-fetch prices and product descriptions for allied seller websites, enabling consumers to compare prices of goods and services and make more informed buying choices
- Market research companies use web scrapers to pull data from forums and social media to help benchmark public sentiment (i.e., report on ‘what’s trending’).
This, however, is where the good part of the web scraping story ends. Bad bots, which according to Imperva’s 2022 Bad Bot Report accounted for 27.7% of all traffic across web, mobile, and APIs, a 2.1% increase over the previous year, fetch content from a website with the intent of using it for purposes outside the site owner’s control. Besides web scraping, cybercriminals use bad bots to conduct a variety of harmful activities, including denial of service attacks, competitive data mining, online fraud, account hijacking, data theft, stealing of intellectual property, unauthorized vulnerability scans, spam, and digital ad fraud.
The two main ways bad actors use web scraping maliciously are undercutting prices to gain an unfair competitive advantage and stealing copyrighted content and intellectual property. The question remains, is it illegal?
The case of LinkedIn and hiQ Labs
In the summer of 2017, LinkedIn sued hiQ Labs, a San Francisco-based startup. hiQ was scraping publicly available LinkedIn profiles to offer clients, according to its website, “a crystal ball that helps you determine skills gaps or turnover risks months ahead of time.”
The notion that your public LinkedIn profile could be used against you by your employer is pretty unsettling. However, on August 14, 2017, a judge decided this was perfectly fine. Judge Edward Chen of the U.S. District Court in San Francisco agreed with hiQ’s claim in a lawsuit that Microsoft-owned LinkedIn violated antitrust laws when it blocked the startup from accessing such data. He ordered LinkedIn to remove the barriers within 24 hours. LinkedIn filed to appeal.
The ruling runs contrary to prior legal decisions that suggested clamping down on web scraping. And it spawns a myriad of questions about social media user privacy and the right of businesses to protect themselves from data hijacking. There’s also the matter of fairness. LinkedIn spent years creating something of real value. Why should it have to hand it over to the likes of hiQ — paying for the servers and bandwidth to host all that bot traffic on top of their own human users, just so hiQ can ride LinkedIn’s coattails?
The last word has not yet been spoken in the legal battle between LinkedIn and hiQ Labs, which describes itself as a “data science company, informed by public data sources, applied to human capital”. LinkedIn is attempting to stop hiQ from scraping personal information from users’ public profiles. After the Ninth Circuit appellate court’s decision in favor of allowing bots to scrape publicly available content, LinkedIn filed its petition requesting Supreme Court review in March 2020. Indeed, in June 2021, the Supreme Court provided LinkedIn with another chance to stop hiQ. The Supreme Court stated that it would not take on the case, however. Instead, it ordered the appeals court to hear the case again in light of its recent ruling, which found that a person cannot violate the Computer Fraud and Abuse Act (CFAA) if they improperly access data on a computer they have permission to use.2 This isn’t the only legal battle LinkedIn is currently fighting; in February 2022, LinkedIn filed a complaint against a group of Singapore-based data scrapers Mantheos Pte. Ltd., Jeremiah Tang, Yuxi Chew, and Stan Kosyakov. The complaint claims that they illegally profit from scraping data from LinkedIn’s website, in violation of its terms of services and to its users’ detriment. The case continues.
What is the verdict on web scraping?
As we have seen here, the legality of web scraping is unsettled, as website owners continue to pursue legal claims to prevent the scraping of their sites. As the courts try to further decide the legality of web scraping, you may likely be in a position to have your data stolen and the business logic of your website abused. Instead of looking for legal remedies to overcome this technology challenge, consider solving it with advanced bot protection and anti-scraping technology today.
Imperva Advanced Bot Protection protects your websites, mobile applications, and APIs from automated attacks without affecting the flow of business-critical traffic. Learn more.
Try Imperva for Free
Protect your business for 30 days on Imperva.