The EU AI Act’s Ban on Facial Image Scraping 

Table of Contents

If your organization builds, deploys, or integrates AI systems that process facial images — whether for identity verification, security, content moderation, or model training — one provision of the EU AI Act demands your immediate and sustained attention.

Article 5(1)(e) of the AI Act places an outright prohibition on AI systems that create or expand facial recognition databases through the untargeted scraping of facial images from the internet or CCTV footage. This is not a high-risk classification requiring additional documentation or conformity assessments. It is a hard ban — one of the AI Act’s so-called “red lines” — that carries no compliance pathway, no risk mitigation option, and no exception based on commercial justification.

For AI product and engineering teams, understanding precisely where this prohibition begins, where it ends, and how it interacts with the GDPR and other EU legal frameworks is not a task that can be deferred to legal counsel at the point of deployment. It must be embedded in system design decisions made long before a product reaches market.

Why This Prohibition Exists: The Regulatory Context

To understand Article 5(1)(e), it helps to understand what prompted it. The provision did not emerge from abstract concern about theoretical misuse of facial recognition technology. It emerged from documented, large-scale regulatory enforcement against companies that had already built exactly the kind of systems the AI Act now prohibits.

Clearview AI — a US-based company that built a facial recognition database containing billions of images scraped from the internet without individuals’ knowledge or consent — became the central reference point for European regulators’ understanding of what untargeted facial image scraping looks like in practice and why it is categorically unacceptable under EU fundamental rights standards.

Between 2022 and 2024, multiple EU data protection authorities (DPAs) imposed significant sanctions on Clearview AI under the GDPR:

  • The Italian Garante fined Clearview AI €20 million in February 2022, imposed a ban on further data collection, and ordered erasure of data relating to Italian residents.
  • The French CNIL imposed a €20 million fine in October 2022, citing the serious risk to fundamental rights posed by the company’s facial recognition practices.
  • The Dutch AP fined Clearview AI €30 million in September 2024 for illegal data collection for facial recognition purposes.

In each case, DPAs found breaches of multiple GDPR provisions: unlawful processing under Article 6, unlawful handling of special category biometric data under Article 9, and failures to honour data subject rights including the right of access (Article 15) and the right to erasure (Article 17). The Garante additionally found violations of the core data protection principles — lawfulness, fairness, and transparency; purpose limitation; and storage limitation — under Article 5 GDPR.

The AI Act’s Article 5(1)(e) prohibition codifies and extends what DPAs had already established through enforcement: untargeted scraping of facial images to build recognition databases is not a practice that can be made lawful through careful compliance management. It is fundamentally incompatible with EU privacy and fundamental rights frameworks, and the AI Act treats it accordingly.

What Article 5(1)(e) Actually Prohibits: The Four Cumulative Conditions

The prohibition in Article 5(1)(e) is precisely drafted. It does not ban all facial image collection, all scraping, or all use of facial recognition technology. It targets a specific combination of conduct, and all four of the following conditions must be satisfied simultaneously for the prohibition to apply:

1. The practice must involve placing on the market, putting into service for this specific purpose, or using an AI system. The prohibition applies to providers who bring systems to market, deployers who put them into operation for this purpose, and end users who operate them. The full value chain — from the team building the scraping infrastructure to the organization operating the resulting database — is within scope.

2. The AI system must aim to create or expand a facial recognition database. The database need not have facial recognition as its sole purpose. According to the European Commission’s Guidelines on Prohibited AI Practices, it is sufficient that the database can be used for facial recognition. A broadly scoped biometric dataset that is capable of supporting facial recognition functions — even if that is one use among several — satisfies this condition. Engineering teams should not assume that labelling a dataset as a “general biometric research dataset” takes it outside the prohibition’s scope if the data it contains could support facial recognition applications.

3. The AI system must employ untargeted scraping methods. This is the most consequential definitional boundary in the provision. Untargeted scraping is characterised as a technique that absorbs as much data and information as possible from different sources without a specific focus on a given individual or a defined group of individuals. Web crawlers, bots, and automated extraction tools that harvest facial images across social media platforms, public websites, or CCTV networks without a predefined, specific target fall squarely within this definition. The key question is whether the scraping operation is defined by its breadth — collect everything available — rather than by a specific, pre-identified target.

4. The images must be sourced from the internet or CCTV footage. The prohibition covers both open-web scraping and the automated extraction of facial images from CCTV or surveillance video. Offline datasets, purpose-built research datasets created through informed consent processes, and images obtained through other lawful channels are not covered by this specific provision — though they remain subject to the GDPR and other applicable law.

All four conditions must be met for the Article 5(1)(e) prohibition to apply. The European Commission’s Guidelines are explicit that this four-part cumulative structure is intentional and consistent across the full set of AI Act prohibited practices — designed to ensure that the bans are precisely targeted at specific, identifiable harm patterns rather than broadly suppressing legitimate AI development.

The Targeted vs. Untargeted Distinction: Where the Line Falls

The most practically significant boundary in Article 5(1)(e) — and the one most likely to generate compliance uncertainty for engineering teams — is the distinction between untargeted and targeted scraping.

The prohibition does not apply to targeted scraping. The Guidelines define targeted scraping as the collection of images or videos of specific individuals or pre-defined groups of persons for law enforcement purposes. This carve-out reflects a deliberate policy choice: if there is a defined, specific subject — a known suspect, a defined group under active investigation — the mass surveillance character of untargeted scraping is absent, and different legal frameworks (including the Law Enforcement Directive, discussed below) govern the practice.

For organizations building commercial AI systems rather than law enforcement tools, this carve-out has limited direct relevance. What matters more is the practical implication of the targeted/untargeted boundary for system architecture decisions:

A system designed to collect images of a pre-defined, bounded set of individuals — using a defined query, specific search parameters, or a constrained source list — operates differently from one configured to harvest as broadly as possible. The former may fall outside the prohibition; the latter will not. In complex systems that combine both targeted and untargeted search functions, the Guidelines are explicit: only the untargeted component is prohibited. Engineering teams building hybrid systems must be able to demonstrate, architecturally, that untargeted scraping functions cannot create or contribute to facial recognition databases.

It is also worth noting that the Guidelines explicitly address a question that frequently arises in the context of social media data: the fact that an individual has published their image publicly — on Instagram, LinkedIn, or any other platform — does not constitute consent for that image to be included in a facial recognition database. This is entirely consistent with the GDPR’s requirements for valid consent as a legal basis for biometric data processing. Public availability is not consent. Engineering teams whose data pipeline rationale rests on “it’s publicly available” should treat that reasoning as legally insufficient.

What Falls Outside the Prohibition — And Why That Still Creates Compliance Risk

Several practices fall outside the scope of Article 5(1)(e), and understanding these boundaries is important for teams assessing what their systems can and cannot do under the AI Act. However, falling outside the Article 5(1)(e) prohibition does not mean falling outside EU law entirely — and in several cases, the compliance risk that remains is substantial.

Non-AI scraping methods are not covered by the prohibition, which is specifically directed at AI systems. Manual or rule-based scraping tools that do not qualify as AI systems under the Act’s definitions fall outside Article 5(1)(e)’s scope. They remain fully subject to the GDPR, however, and the GDPR analysis for biometric data collection through any scraping method is likely to reach the same conclusion that DPAs reached in the Clearview AI cases: there is no lawful basis for it.

Biometric data other than facial images — voice samples, gait data, iris patterns — is not covered by Article 5(1)(e), which is specifically focused on facial images. Again, this does not mean such data can be collected without restriction: the GDPR’s Article 9 special category provisions apply fully, and the absence of a specific AI Act prohibition does not create a compliance-free zone.

AI systems that scrape facial images to train generative AI models producing images of fictitious persons are explicitly noted in the Guidelines as falling outside the prohibition’s scope. The policy rationale is to permit effective training of generative image models. However, the Guidelines themselves flag that this use case triggers complex compliance obligations under both EU copyright law and the GDPR — facial images are personal data even when used as training inputs, and where they are processed for the purpose of uniquely identifying a person, they constitute special category biometric data under Article 9. Engineering teams building generative image models should not treat the Article 5(1)(e) carve-out as a clearance for unconstrained facial image collection. It is an absence of one specific prohibition, not an affirmative permission.

This use case also raises deepfake-related concerns that the current AI Act framework does not fully address. At the time of writing, the European Parliament has reached a reported political agreement on an AI Act Omnibus that would add a new prohibition on non-consensual sexual deepfakes to the Article 5 list of banned practices. Even with that addition, not all categories of harmful AI-generated deepfakes will be covered — signalling that this is an area of active and ongoing legislative development that engineering teams should monitor closely.

The GDPR and Law Enforcement Directive: Overlapping Obligations

Article 5(1)(e) AI Act does not operate in isolation. For privacy and engineering teams assessing their organization’s obligations, two additional legal frameworks bear directly on facial image scraping and facial recognition database practices.

The GDPR remains the primary and most comprehensive source of protection for individuals whose facial images are scraped or processed. Facial images that are processed for the purpose of uniquely identifying a person constitute special category biometric data under Article 9, subject to the GDPR’s most stringent processing conditions. As the Clearview AI enforcement record demonstrates, there is currently no recognised lawful basis under the GDPR for the untargeted scraping of facial images to build recognition databases. The AI Act’s prohibition and the GDPR’s requirements are mutually reinforcing: conduct that breaches Article 5(1)(e) AI Act will, in the overwhelming majority of cases, also breach the GDPR.

The Law Enforcement Directive (LED) governs data protection in law enforcement contexts and is directly relevant to how law enforcement authorities (LEAs) may use facial recognition databases. The LED takes a more nuanced approach than the AI Act’s outright ban: it may permit particularly intrusive practices where they are strictly necessary, sufficiently targeted, proportionate, and grounded in Union or Member State law. However, untargeted scraping is unlikely to satisfy the LED’s Article 10 conditions for processing special category data, which require strict necessity and appropriate safeguards.

One unresolved legal question that the current framework does not definitively answer concerns databases created from untargeted scraping before the AI Act prohibition took effect — and specifically whether LEAs in the EU may continue to use such databases under the LED even though creating new ones is now prohibited. Engineering teams and legal counsel working with law enforcement clients or government agencies should flag this gap and seek specific legal advice on the status of pre-existing database assets.

What AI Product and Engineering Teams Must Do Now

The Article 5(1)(e) prohibition is not a compliance obligation that sits comfortably at the legal team’s desk. Its practical implications run directly to system architecture, data pipeline design, dataset provenance, and product feature decisions. The following actions belong in the engineering organisation’s compliance workflow, not just in legal review:

Audit your data collection pipelines for untargeted scraping components. If any part of your data collection infrastructure uses web crawlers, bots, or automated extraction tools that harvest facial images broadly from the internet or video sources, assess whether those pipelines satisfy all four conditions of the Article 5(1)(e) prohibition. If they do, they must be suspended or redesigned — there is no remediation pathway that makes untargeted facial image scraping for recognition database purposes lawful under the AI Act.

Assess your datasets for provenance. For existing facial image datasets used in training, validation, or recognition functions, document how each dataset was assembled. Datasets whose provenance cannot be traced to lawful collection methods — including datasets obtained from third parties without clear chain-of-custody documentation — carry both AI Act and GDPR liability risk. “We didn’t scrape it ourselves” does not eliminate exposure if the dataset was built through prohibited methods by its original creator.

Apply the targeted/untargeted distinction as a design principle. Where your system has a legitimate need to collect facial images — through targeted, consent-based, or otherwise lawful methods — architect the data collection function in a way that enforces the targeted boundary by design. Systems that can be configured to expand from targeted to untargeted collection must treat that expansion as a prohibited state, not a configurable option.

Do not rely on public availability as a legal basis. Remove “publicly available” from your list of justifications for facial image collection. For GDPR purposes, public availability of biometric data does not constitute consent. For AI Act purposes, the source of the scraped images is irrelevant to whether the prohibition applies — it applies to internet sources explicitly.

Monitor the AI Act Omnibus process. The anticipated addition of a prohibition on non-consensual sexual deepfakes to the Article 5 list signals that the AI Act’s prohibited practices framework is not static. Engineering teams building generative image, video, or avatar systems should treat the current Article 5 list as a floor, not a ceiling, and monitor legislative developments accordingly.

EU AI Act Prohibitions

The EU AI Act’s prohibition on untargeted facial image scraping under Article 5(1)(e) is one of the clearest red lines in the entire regulation. It was not drafted in anticipation of a theoretical harm — it was drafted in direct response to a documented, enforced, large-scale privacy violation pattern that multiple European regulators had already found to be incompatible with fundamental rights. The Clearview AI enforcement record is both the precedent and the warning.

For AI product and engineering teams, the compliance imperative is architectural. Understanding the four cumulative conditions of the prohibition, the targeted/untargeted boundary, the scope of lawful exceptions, and the interaction with GDPR and LED obligations must inform system design from the earliest stages of development — not surface as a legal issue at the point of deployment.

The AI Act Article 5 prohibited practices are called red lines for a reason. This is one of them.

Written by: 

Online Privacy Compliance Made Easy

Captain Compliance makes it easy to develop, oversee, and expand your privacy program. Book a demo or start a trial now.