How the Legal Basis for AI Training Is Framed in Data Protection Guidelines: A Multi-Jurisdictional Doctrinal Analysis

Table of Contents

I. The rapid expansion of artificial intelligence (AI) systems trained on vast quantities of personal data has placed unprecedented strain on the legal foundations of data protection law. While much scholarly and regulatory attention has focused on issues of transparency, accountability, and downstream harms, a more foundational question remains insufficiently resolved: on what lawful basis may personal data be processed for the purpose of AI training?

Across jurisdictions, data protection authorities have responded to this question with a mixture of guidance, interpretive statements, and selective enforcement actions. At first glance, these interventions suggest an emerging convergence. Regulators increasingly acknowledge that consent is ill-suited to the realities of large-scale AI model development and implicitly or explicitly endorse legitimate interest as the most viable legal basis. Yet this apparent alignment masks a deeper instability. The operational meaning of legitimate interest in AI training contexts remains thin, fragmented, and underdeveloped. Procedural expectations—necessity, proportionality, and balancing—are frequently gestured at but rarely specified. Enforcement actions, meanwhile, function more as regulatory signals than as vehicles for doctrinal clarification.

This paper argues that the current framing of lawful bases for AI training reflects a form of regulatory accommodation rather than doctrinal resolution. Legitimate interest is conceptually appropriate for many AI training activities, but it has not been meaningfully operationalized. Consent, although formally preserved, is increasingly relegated to a rhetorical role. The result is a system in which legality risks becoming formalistic: lawful bases are invoked without supplying the substantive risk governance that data protection law promises.

Grounded in a multi-jurisdictional doctrinal analysis, this paper advances a normative critique rooted in principles of risk-based regulation and proportionality. It contends that attempts to standardize legitimate-interest assessments for AI training are unlikely to succeed and may further entrench formalism. Instead, the paper proposes a shift toward sector-specific AI training regimes that embed contextual proportionality and differentiated safeguards.

II. Methodology and Scope

This study adopts a doctrinal legal methodology, analyzing how lawful bases for AI training are articulated across regulatory guidance, supervisory communications, and enforcement actions. The focus is not on empirical outcomes or technical model behavior, but on legal reasoning and regulatory framing.

The analysis is fully multi-jurisdictional. While the European Union’s General Data Protection Regulation (GDPR) serves as an analytical anchor due to its global influence, the paper draws comparatively on guidance and interventions from multiple jurisdictions, including common-law and civil-law systems with functionally similar data protection regimes. Rather than cataloging every authority exhaustively, the study identifies recurring patterns in regulatory reasoning.

Enforcement actions are treated as illustrative signals rather than definitive adjudications. Their value lies not in their precedential force, but in what they reveal about regulatory priorities, tolerances, and interpretive gaps.

III. Lawful Bases for AI Training: Doctrinal Foundations

A. Legitimate Interest as a Legal Concept

Legitimate interest occupies a distinctive position within modern data protection law. Unlike consent, it does not require an affirmative expression of individual will. Unlike legal obligation or public task, it is not tethered to statutory mandates. Instead, legitimate interest functions as a flexible legal basis that permits processing where a controller’s interests are balanced against the rights and freedoms of data subjects.

Doctrinally, legitimate interest is structured around three core elements:

  1. Purpose legitimacy: the interest pursued must be lawful and sufficiently articulated.
  2. Necessity: the processing must be necessary to achieve that interest.
  3. Balancing: the interest must not be overridden by data subjects’ rights and expectations.

In theory, this structure aligns well with risk-based regulation. In practice, its application to AI training raises acute difficulties.

B. Consent and Its Structural Limitations

Consent has long been treated as the paradigmatic lawful basis in data protection law. Yet its doctrinal requirements—freely given, specific, informed, and revocable—sit uneasily with AI training practices characterized by scale, opacity, and downstream reuse.

Regulators increasingly acknowledge these tensions. However, rather than formally excluding consent from AI training contexts, authorities tend to preserve it rhetorically while implicitly discouraging reliance on it. This ambivalence has significant implications for legal certainty and accountability.

IV. Comparative Regulatory Framing Across Jurisdictions

Despite jurisdictional variation, a consistent pattern emerges: legitimate interest is tacitly elevated as the default legal basis for AI training, while consent is retained as a formal but impractical option.

Comparative Table: Regulatory Treatment of Lawful Bases for AI Training

Jurisdiction Legitimate Interest Consent Operational Guidance Enforcement Practice
European Union Conceptually endorsed; widely assumed Formally available but discouraged High-level, abstract Selective, limited scrutiny
United Kingdom Pragmatic acceptance Retained rhetorically Risk-based language without specificity Signaling actions
Canada Contextual reliance Narrow applicability Case-by-case framing Minimal AI-specific enforcement
Australia Implied acceptance Theoretical option General necessity tests Sparse intervention
Asia-Pacific (various) Emerging acceptance Rarely operational Fragmented Developmental

The table illustrates not harmonization, but parallel ambiguity. Across systems, regulators converge on legitimate interest in principle while avoiding detailed prescriptions.

V. Legitimate Interest: Conceptual Acceptance, Operational Failure

A. The Absence of Meaningful Necessity Analysis

Necessity is frequently invoked as a doctrinal requirement, yet rarely interrogated. Regulatory guidance seldom addresses what necessity means in the context of AI training, where alternative architectures, synthetic data, or narrower datasets may exist. Controllers are often left to self-assess necessity without benchmarks or comparators.

B. The Hollowing of the Balancing Test

Balancing tests are routinely described as central to legitimate interest assessments. In practice, however, regulators rarely articulate how competing interests should be weighed in AI contexts. Factors such as scale, data provenance, reidentification risk, and model persistence are mentioned inconsistently, if at all.

C. Safeguards as Substitutes for Analysis

Safeguards—such as anonymization, security measures, or access controls—are often treated as compensatory mechanisms. Yet safeguards are not substitutes for necessity or balancing; they presuppose that processing is justified in the first place. Their elevation reflects a shift from substantive legality to procedural reassurance.

VI. Consent as a Rhetorical Residual

Although consent remains embedded in statutory texts and regulatory discourse, its practical relevance to AI training has diminished significantly. Regulators frequently acknowledge that consent is difficult or impossible to obtain at scale, particularly for secondary or unforeseen uses.

Rather than confronting this incompatibility directly, authorities tend to preserve consent as a theoretical option while tacitly signaling that legitimate interest is preferable. This duality allows regulators to maintain doctrinal continuity while accommodating industrial realities—but at the cost of transparency and accountability.

VII. Enforcement as Signaling Rather Than Substantiation

Enforcement actions related to AI training often focus on peripheral issues—transparency failures, security lapses, or governance deficiencies—rather than directly interrogating the asserted lawful basis. When legitimate interest is addressed, analysis tends to be cursory.

This pattern suggests that enforcement serves a signaling function: reminding controllers of general obligations without clarifying doctrinal standards. While such signaling may deter egregious behavior, it does little to resolve systemic ambiguity.

VIII. Risk-Based Regulation and Proportionality

Risk-based regulation emphasizes contextual assessment, proportionality, and differentiated obligations. In theory, legitimate interest aligns with this paradigm. In practice, however, its standardized application to AI training undermines proportionality.

AI training activities vary dramatically across sectors—healthcare, finance, consumer analytics, language modeling—and across risk profiles. Attempting to impose a uniform legitimate-interest framework obscures these differences and incentivizes lowest-common-denominator compliance.

IX. Why Standardized Legitimate-Interest Assessments for AI Fail

Efforts to standardize legitimate-interest assessments for AI training encounter structural limitations:

  1. Heterogeneity of data sources
  2. Variability of downstream uses
  3. Divergent risk profiles
  4. Dynamic model evolution

Standardization risks transforming legitimate interest into a checkbox exercise rather than a substantive inquiry. The more regulators attempt to generalize, the less meaningful proportionality becomes.

X. Toward Sector-Specific AI Training Regimes

This paper proposes a shift toward sector-specific AI training regimes as a normative response. Rather than relying on generic lawful-basis frameworks, regulators should articulate context-dependent standards tailored to specific domains.

Sector-specific regimes could:

  • Define acceptable data sources
  • Specify proportionality benchmarks
  • Mandate tailored safeguards
  • Enable clearer enforcement expectations

Such an approach aligns more closely with risk-based regulation and preserves the legitimacy of lawful-basis doctrines by grounding them in concrete contexts.

XI. Conclusion

The framing of lawful bases for AI training across data protection regimes reflects a regulatory compromise rather than a settled doctrinal position. Legitimate interest is widely accepted in concept but poorly operationalized. Consent is retained rhetorically but abandoned in practice. Enforcement actions signal concern without supplying clarity.

Without reform, lawful bases risk becoming formalities divorced from meaningful risk governance. A shift toward sector-specific AI training regimes offers a more credible and proportionate path forward—one that respects both the realities of AI development and the normative commitments of data protection law.

Written by: 

Online Privacy Compliance Made Easy

Captain Compliance makes it easy to develop, oversee, and expand your privacy program. Book a demo or start a trial now.