September 23, 2022
7 min

Web Scraping - is it legal or illegal? 

Some people state categorically whether web scraping is illegal or legal. Web scrapers themselves may argue that it is completely lawful to do so, or corporate attorneys and anti-bot businesses may argue the opposite.

You know, in fact - there is no simple yes or no answer to this question, because it largely relies on the specifics of your situation and the definition of web scraping that you use. 

Web scraping can simply be defined as the procedure of gathering various data from various websites. Many legitimate data analysis operations use scraping, which is an important and useful component. Although web data scraping is not illegal in and of itself, it may be (or be considered to be) so depending on the following three factors:

  • The type of data used for scraping.
  • Your intention to use the extracted data.
  • How the data is extracted from the website.

What Are The Prohibited Data Categories For Web Scraping?

The category of data you are scraping and how you intend to use it, whether it be e-commerce, personal, or article data, can have a significant impact on its legality. Even when scraping a website is perfectly legal, the way you plan to use the data may make it illegal.

You need to worry about the following two categories of data:

  1. Copyrighted Data 
  2. Personal Data

Accordingly you are generally safe if the data you are scraping from the website doesn't match any of the examples above.

Copyrighted Data

The following types of data are protected by copyright:

  • Videos
  • Pictures
  • Articles
  • Music
  • Databases

So it actually depends on what you intend to do with the data once you've scraped it; because it's not illegal to scrape copyrighted material; it's what you intend to do with the copyrighted data that could make it so.

The vast majority of content on the internet is copyrighted in some way. A piece of content that is copyrighted cannot be duplicated without the author's permission (a license) or other legal authorization, among other things. The legal permissions are your best option because copying content is the exact definition of scraping, and you practically never obtain the author's explicit authorization. 

The fair use concept in the US permits the scraping of copyrighted content. We advise first determining if you meet those requirements before attempting to apply the fair use concept to your scraping:

  • Meaningful transformation is made to the original material. For instance, a list of product names and prices is created from the HTML of a webpage. Reprinting original content is prohibited.
  • Don't produce a rival product. It is mostly OK to scrape real estate bids for quantitative analysis, but it is not acceptable to scrape the same content and publish it on your own website.
  • Avoid copying a significant amount of the original work, if at all possible. Don't scrape information that you don't need.

Important to note that facts are not copyrightable. You can argue that the data you intend to scrape is factual in nature because facts like product names, pricing, features, etc. aren't protected by copyright laws.

The problem of database rights, however, is a tougher part of copyright law. 

Definition of the “database” - a database is a structured collection of resources that allows users to access and search for specific pieces of information that are stored within. It means that it may be unlawful to copy a complete database from the website and then reproduce it exactly for your own use.

Once more, the definition of a database and the legal protections provided to the database owner varies between US and EU regulations. Therefore, it is crucial to comprehend the laws and rules of the legal systems you are scraping in. By modifying the way the data is scraped and used, the risks of violating someone's database rights can be minimized. 

The most important thing of all is to respect the original author's work and their business strategy, even though there are many ways to engage in legal scraping in the EU or the US. As an ethical scraper you shouldn’t copy the organizational structure of the original database and only scrape the parts of the available data.

Personal Data

The definition of personal data in the GDPR (General Data Protection Regulation) is as follows:“Personal data means any information relating to an identified or identifiable natural person; an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;” 

These are some examples of personal data:

  • Name, surname, date of birth, residence, social security number, passport number, and national identification number are all official information about a person.
  • Contact information: telephone number, email, and IP address, Facebook, Twitter, and other social networks.
  • Some data that is often collected by applications: location (address or GPS), shopping preferences, behavioral data.
  • Some special types of personal data: racial or ethnic origin, sex, gender, and sexual orientation, medical records, religious beliefs, political opinions
  • Biometric data.

When you scrape personal data from a website, you almost always do not have the data owner's consent (the person whose data you are scraping) to do so, and it is very difficult to claim that you have one of these legal justifications to do so:

  1. Consent - The data subject gave us permission to use their information.
  2. Contract: The personal information is needed to carry out a deal with the data subject.
  3. In order to comply with a legal requirement, compliance is required.
  4. Legitimate Interest - needed for our legitimate interests; reasonable interest.
  5. The terms "vital interest," "public interest," or "official authority" are usually only pertinent to state-run organizations when access to personal data is in the public interest.

Once you're sure that your scraping is not harming anyone, you should determine which laws apply to you. Even if you intend to scrape personal data from people in other parts of the world, GDPR still applies to you if your business is based in the EU. You must conduct your research if you are an EU company. Due to a legitimate interest, it will occasionally be acceptable to proceed, but more often than not, you will have to provide your non-EU partners or competitors with the project to scrape personal information. On the other hand, you can be in the clear if you aren't an EU corporation, don't conduct business there, and don't have any EU residents as your target market.

In addition, you should program your scrapers to gather as little personal information as possible and to store it only for a short time.

Terms of Use & Scraping

Because there are laws that clearly define what is legal and what is illegal, it is quite simple to determine whether scraping personal or copyrighted data will make your web scraping illegal. Since no government has passed any legislation explicitly legalizing or criminalizing web scraping, things become much trickier when it comes to the practice itself.

The Terms of Use published on many websites may prohibit or allow web scraping (or automated access). As a matter of common sense, you should always presume that accessing a website and scraping its content is prohibited until you have carefully read its Terms of Use.

Also pay attention to these points: 

  • Is the website's data publicly available? The website's terms of use aren't enforceable if the data isn't kept behind a login, therefore you can lawfully scrape the public data.

Is it necessary to create an account and log in to view the data? If so, you have to carefully read the terms of use you accepted when you created the account because, by doing so, you gave them legal force.

Read similar blogs