Artificial intelligence runs on data. That proposition is familiar, almost banal. But its legal consequences are not. Modern AI development depends on mass acquisition, processing, and reuse of data across contexts that were never designed to be interoperable: consumer-facing platforms, business-to-business pipelines, scraping at internet scale, data brokerage, enterprise licensing, and internal “data lakes” that blend personal information with expressive works. The resulting data flows place acute stress on the two legal regimes that most directly govern data in the United States: information privacy law and copyright law.
Formally, these regimes ask different questions and impose different obligations. Privacy law concerns who may collect, use, share, and secure personal information, under what conditions, and with what consumer-facing controls. Copyright law concerns who may reproduce, distribute, prepare derivative works from, or publicly display original expression fixed in a tangible medium, subject to limitations and defenses. The two bodies of law reflect different histories, institutional actors, and normative commitments. Privacy law, especially in U.S. form, often tolerates extensive collection conditioned on disclosure and consumer choice; copyright law often tolerates extensive use conditioned on licensing or defenses that operate on different conceptual axes.
Functionally, however, the boundary between privacy and copyright is blurring in ways that make each regime harder to read, harder to apply, and easier to exploit. When the same datasets simultaneously contain protected expression, personal information, and quasi-public platform data; when training corpora are assembled through opaque chains of custody; and when AI developers defend their data practices by strategically toggling between “public” and “private” characterizations depending on the legal and procedural posture, the distinct logics of privacy and copyright begin to collapse into one another. The regimes remain separate on paper, yet become illegible in operation. This phenomenon can be described as inter-regime doctrinal collapse.
Doctrinal collapse is not an abstract worry. It produces concrete litigation behavior, predictable discovery disputes, and durable contracting patterns. It also generates distributional effects: it tends to benefit well-resourced incumbents capable of buying data access at scale or extracting broad consent through ubiquitous consumer-facing contracts, while disadvantaging smaller entrants, individual creators, and the people whose personal information is implicated in the data supply chain. In this environment, the operative question is not simply whether privacy law or copyright law “wins.” It is whether either regime can continue to constrain private power when powerful actors can route around each regime by strategically exploiting the fault lines between them.
This article identifies the core mechanics of inter-regime doctrinal collapse, explains how it facilitates two dominant exploitation tactics—buying and asking—and argues that collapse poses a rule-of-law problem: when leading AI developers can claim that data is public enough to scrape yet private enough to conceal, the law’s capacity to supervise power is compromised. The article concludes by sketching institutional responses—drawn from conflict of laws and legal pluralism—to supply what can be called a law of collapse: a set of methods for managing overlaps and contradictions between regimes in a principled, transparent, and administrable way.
What “Inter-Regime Doctrinal Collapse” Is
Doctrinal collapse is not the same as doctrinal conflict. Conflicts between privacy and copyright have long existed. They arise whenever personal information is embedded in expressive works, whenever copyrighted content is distributed through systems that track user behavior, or whenever the enforcement of one regime requires disclosures that implicate the other. Those conflicts can be managed through familiar techniques: preemption analysis, statutory interpretation, balancing tests, protective orders, and the compartmentalization of remedies.
Inter-regime doctrinal collapse is different. It describes a situation in which boundaries between regimes blur so thoroughly that the operative principles of each regime become hard to distinguish, and the ability to reason from the regime’s internal logic to a stable legal outcome begins to erode. The collapse is “inter-regime” because it arises from the interaction of multiple legal systems rather than from doctrinal confusion inside a single field. The collapse is “doctrinal” because it concerns the intelligibility of legal rules, tests, and categories. And it is a “collapse” because it enables a functional substitution: actors can selectively invoke whichever regime characterization best suits their immediate objectives, even if those characterizations are mutually inconsistent over time.
The key dynamic is strategic plasticity. In one posture, the data at issue is framed as publicly accessible, lawfully scrapeable, or sufficiently non-confidential that restrictions would frustrate innovation and free expression. In another posture—often in discovery, oversight, or public accountability—the same data is framed as proprietary, trade secret, confidential, or otherwise shielded from disclosure. Privacy and copyright law each supply pieces of these arguments. Privacy law can be invoked to assert that training data contains sensitive or identifiable personal information that should not be revealed. Copyright law and trade secrecy can be invoked to argue that revealing datasets would disclose protected expression or confidential business information. Conversely, privacy defenses may be muted when the goal is to justify mass acquisition; copyright defenses may be emphasized or minimized depending on which creates the lowest friction.
Collapse therefore functions as an exploitation layer above substantive doctrine. It is not that privacy and copyright no longer exist as separate bodies of law. It is that their distinctions become tools for maneuver rather than constraints on action.
Why AI Accelerates Collapse
AI development is a catalyst for collapse because it compresses the distance between acquisition and use and because it treats heterogenous data types as functionally substitutable inputs. Modern training pipelines ingest text, images, audio, code, metadata, and user interaction logs. They also ingest data about people, which may be directly identifying, indirectly identifying, or inferentially identifying. Once ingested, these materials are transformed into model parameters and embeddings that are difficult to map back to source items without specialized analysis. The result is a system in which upstream legal questions (Was this collected lawfully? Was it licensed? Was it consented to?) become entangled with downstream legal questions (Does the model reproduce protected expression? Does it process personal information? Who controls the system’s outputs?).
This entanglement is not merely technical; it is institutional. Copyright enforcement and privacy enforcement proceed through different procedural routes, different agencies, different remedies, and different conceptions of harm. Copyright disputes often gravitate toward licensing, damages, and injunctive relief; privacy disputes often gravitate toward regulatory compliance, consent validity, and restrictions on processing or sharing. AI litigation and regulation increasingly require tribunals to adjudicate both at once—often with incomplete facts about training datasets, data lineage, and model behavior.
The opacity of training corpora amplifies collapse further. Because AI developers often refuse to disclose full training datasets, opponents may have to argue from inference: outputs that resemble protected works, system behavior consistent with personal-data processing, or third-party reports about data acquisition. That uncertainty invites procedural tactics. It also invites doctrinal opportunism: actors can plausibly assert multiple, inconsistent characterizations of the underlying facts because the facts are not fully visible.
Collapse as a Rule-of-Law Problem
The rule of law depends on the ability to predict, explain, and contest the application of legal rules. It also depends on the ability of institutions—courts, regulators, and the public—to supervise powerful actors. Doctrinal collapse threatens both.
When legal categories become illegible, the law’s constraining function weakens. Inter-regime collapse allows firms to make arguments that are individually plausible within each regime’s vocabulary but collectively incoherent when viewed across contexts. The result is not merely doctrinal confusion; it is asymmetry. Well-resourced actors can exploit ambiguity and opacity to sustain operations while disputes grind through procedural obstacles. Less-resourced actors cannot. Individual creators and individuals whose personal information is implicated in training data typically lack the leverage to force disclosure, enforce tailored terms, or negotiate licensing arrangements on equal footing.
Collapse also distorts public accountability. A system cannot be both “public enough” to justify mass appropriation and “private enough” to avoid oversight without undermining the basic premise that law can mediate competing interests transparently. When oversight is blocked by claims of confidentiality and acquisition is justified by claims of publicness, the practical effect is to free powerful actors from both regimes’ constraints.
The Two Dominant Exploitation Tactics: Buying and Asking
Doctrinal collapse has enabled two dominant acquisition strategies that have become common in AI development and deployment. Both exploit gaps between privacy and copyright. Both are structurally friendly to incumbents. And both test the capacity of law to constrain private power.
Buying: Business-to-Business Data Deals That Route Around Individual Interests
Under the buying strategy, companies acquire large datasets through business-to-business arrangements: licenses, partnerships, platform access deals, data broker contracts, enterprise API agreements, or vendor relationships that include data-sharing clauses. These arrangements can be framed as lawful acquisition, often with contractual warranties and indemnities that shift risk downstream or upstream. But the critical feature is that they commonly bypass the individuals whose personal information is included and bypass creators whose works may be embedded in the data.
From a privacy perspective, buying can route around meaningful individual control. Even where privacy laws provide rights to access, delete, or opt out, those rights are often difficult to exercise when the individual does not know their data has been included in a training corpus or transferred through multiple intermediaries. The transaction takes place between firms; the individual is relegated to a disclosure regime in which notice may be generalized, buried, or functionally non-operative. The formal structure of notice and choice remains, but practical agency is thin.
From a copyright perspective, buying can route around creator control by treating datasets as commodities rather than as collections of expressive works. Licensing may occur at the dataset level, with terms that do not reflect the interests of individual authors, photographers, or code contributors. The market power of platforms and aggregators can produce licensing arrangements that resemble compulsory access, particularly when creators cannot feasibly opt out or negotiate individually.
Buying also interacts with collapse by producing a convenient evidentiary posture: a firm can argue that its acquisition was “licensed” and therefore legitimate without disclosing details about the chain of title, the scope of rights conveyed, or whether the licensed dataset included materials that the vendor was not authorized to license. If challenged, the firm can invoke confidentiality and trade secrecy to resist discovery, while simultaneously relying on the existence of the deal as a legitimacy signal.
Asking: Broad Consent Through Notice-and-Choice Frameworks
Under the asking strategy, companies obtain broad permission from users through terms of service and privacy policies that purport to authorize sweeping collection, use, and sharing of data. The request may be framed as necessary to provide the service, to improve products, to develop new features, or to train models. In practice, the permission is often generalized, open-ended, and difficult to refuse without abandoning the platform.
Asking exploits the weaknesses of notice-and-choice privacy models. Consent is formally obtained, but it may be neither meaningfully informed nor freely given. Users face contract-of-adhesion conditions, information overload, dark patterns, and take-it-or-leave-it access structures. Even sophisticated users may not understand how their data will be used in training pipelines or how it will be combined with data from other sources.
Asking also interacts with copyright norms by transforming data acquisition into a contract question rather than an infringement question. Where users upload content that may be protected expression, terms of service often claim broad licenses to use that content. Those licenses may be written for platform operations but are broad enough to cover training and derivative uses. The legal question becomes whether the user had authority to grant the license, what the scope is, and whether downstream uses exceed reasonable expectations.
Collapse intensifies here because privacy and copyright arguments blur. A firm can frame user-shared content as licensed and therefore permissible under copyright, while framing the same content as personal data covered by privacy policies and therefore subject to proprietary confidentiality protections in litigation. The user is simultaneously positioned as a licensor (to justify use) and as a privacy subject (to justify secrecy). The doctrinal categories still exist, but they are rearranged to serve institutional advantage.
Litigation Pressure Points: The Public/Private Toggle
Inter-regime collapse becomes most visible in litigation. Even without reproducing the specifics of any one case, the pattern can be described: plaintiffs or regulators seek to challenge training data acquisition or model behavior; defendants argue that acquisition relied on public availability, implied permission, licensing, or legal defenses; and then, when adversaries seek disclosure of training datasets, defendants claim confidentiality, trade secrecy, privacy sensitivity, or security risk.
This is not merely aggressive lawyering. It is a structural symptom of collapse. The same legal system that tolerates large-scale data flows conditioned on disclosure also empowers firms to assert proprietary control over the evidentiary record needed to test those disclosures. The same legal system that defines copyrighted expression as protectable property also tolerates defenses that can treat that expression as free input when collected at scale—while still treating training corpora and pipelines as proprietary when challenged.
The result is a recurring public/private toggle:
Public Enough to Take
In the acquisition posture, data is characterized as public: posted openly on the internet, accessible without authentication, available through common browsing tools, or otherwise exposed through platform design. The argument often emphasizes innovation, research, competition, and the impracticality of individualized licensing or consent at scale. This posture has rhetorical power: it frames restrictions as threats to progress and openness.
Private Enough to Hide
In the oversight posture—discovery, audits, regulatory demands, or public inquiries—the same data is characterized as private: the training set is a trade secret; disclosure would enable model theft; the dataset contains sensitive personal information; publication would jeopardize security; or transparency would chill innovation. This posture also has rhetorical power: it frames disclosure as dangerous, intrusive, or unfair.
Toggling between these characterizations allows a powerful actor to gain the benefits of publicness while retaining the protections of privateness. That combination is precisely what generates rule-of-law concern: it insulates the actor from accountability while enabling expansive acquisition.
Discovery Disputes and the Politics of Proof
Doctrinal collapse reshapes discovery by changing what counts as relevant, what counts as proportional, and what counts as protected. In disputes over AI training, plaintiffs and regulators often need access to training datasets, data lineage records, and internal documentation about selection and filtering. Without such access, the case may turn on inference and speculation, which raises barriers to proving harm or illegality.
Defendants, in turn, can rely on a dense thicket of protective claims. Some claims sound in classic confidentiality and trade secrecy. Others sound in privacy: training data contains personal information, including sensitive attributes; disclosure would violate privacy obligations; the risk of reidentification is non-trivial; or the dataset includes third-party information that cannot be revealed. Still others sound in security and safety: releasing data enables misuse. Each claim may be plausible in isolation. Together, they can produce an environment where the most critical facts are systematically withheld.
Collapse thus creates a politics of proof. The party with superior control over information also has superior ability to shape which doctrine governs access to that information. If privacy frames disclosure as wrongful, it can be used to block discovery. If copyright frames training data as public enough to scrape, it can be used to justify acquisition. The law’s internal checks weaken when the doctrines can be arranged opportunistically.
Licensing Agreements as Private Ordering in the Shadow of Collapse
One might hope that licensing markets resolve these tensions: copyright can be licensed; data rights can be negotiated; contracts can allocate risk; and private ordering can scale where public law struggles. But collapse changes the meaning of licensing.
When licensing occurs in a world of doctrinal collapse, it often reflects the market power of platforms and incumbents rather than a balanced reconciliation of interests. The licensing party may not represent the individuals whose personal data is bundled into the dataset. It may not represent the creators whose works are included. And it may not know the downstream uses of the data, particularly in environments where models are fine-tuned and repurposed.
Licenses can therefore become tools of insulation rather than tools of legitimacy. A firm can point to a license as proof of lawful acquisition, while the license itself may be overbroad, under-specified, or misaligned with the interests at stake. Meanwhile, the existence of the license can be invoked to keep details confidential, increasing opacity rather than reducing it.
Distributional Consequences: Why Collapse Favors Incumbents
Inter-regime doctrinal collapse is not neutral. It favors established corporate players in at least three ways.
First, incumbents can buy. They can afford large-scale business-to-business data deals, compliance teams, indemnities, and the litigation capacity to defend their practices. New entrants often cannot. If the primary path to lawful training data is expensive licensing or closed partnerships, the market tilts toward concentration.
Second, incumbents can ask at scale. They operate platforms with millions of users and can embed expansive consent terms into ubiquitous services. When access to social participation, work tools, or cultural distribution is conditioned on agreeing to broad terms, consent becomes a competitive moat.
Third, incumbents can litigate through opacity. They can invest in secrecy infrastructure, procedural defenses, and discovery battles. Less-resourced opponents struggle to obtain the evidence required to challenge acquisition and use practices, which further entrenches the status quo.
These dynamics matter to the rule of law because they relocate governance from public institutions to private contracts and litigation tactics. Collapse thus impedes law’s ability to constrain arbitrary private power, not by eliminating doctrine, but by making doctrine functionally optional for those with sufficient leverage.
Doctrinal Diagnostics
If inter-regime doctrinal collapse is a real phenomenon rather than a rhetorical label, it should be identifiable through observable symptoms. The following diagnostics describe common markers of collapse in AI-related disputes:
- Characterization instability: The same dataset is described as public in acquisition and private in oversight, with minimal factual change.
- Cross-regime substitution: Arguments from one regime (privacy, copyright, trade secrecy) are used to secure advantages in disputes governed by the other.
- Opacity as leverage: The inability to see training data becomes a durable strategic asset rather than an incidental feature.
- Consent overreach: Broad consumer terms are treated as a general-purpose authorization for downstream uses beyond reasonable user expectations.
- Supply-chain diffusion: Responsibility for acquisition and use is dispersed across intermediaries in ways that defeat meaningful accountability.
These diagnostics do not by themselves determine legality. They do, however, indicate a governance environment in which doctrinal boundaries are being exploited rather than applied.
The Need for a “Law of Collapse”
If doctrinal collapse is an institutional problem, it cannot be solved solely by sharpening individual rules inside privacy or copyright. The problem is not simply that privacy doctrine is weak or that copyright doctrine is uncertain. The problem is that the interaction between regimes creates a space where the strongest constraints of each regime can be avoided while the most convenient protections of each regime can be retained.
What is needed, therefore, is a law of collapse: an institutional method for managing overlaps between regimes, structuring decision-making when doctrines point in competing directions, and preventing strategic toggling from undermining accountability. The goal is not to merge privacy and copyright into one field. It is to supply a second-order set of principles that govern how conflicts and overlaps are handled, especially in contexts of high opacity and high power asymmetry.
Two traditions offer useful tools for constructing such a law of collapse: conflict of laws and legal pluralism.
Conflict-of-Laws Tools for Inter-Regime Overlap
Conflict of laws is often associated with jurisdictional questions—choice-of-law clauses, cross-border disputes, and multi-state torts. But its deeper function is institutional: it provides techniques for managing overlapping legal authorities and competing norms without collapsing them into one. Those techniques can be adapted to inter-regime overlap within a single legal system.
One conflict-of-laws impulse is to specify which regime governs which question. In AI contexts, this might mean formally separating questions of acquisition from questions of oversight, or distinguishing rights allocation questions (copyright) from governance questions (privacy). But simplistic partitioning is not enough because AI practices often implicate both simultaneously.
A more useful conflict-of-laws impulse is to require explicit articulation of the regime choice and its consequences. If a party characterizes data as public to justify acquisition, an institutional rule might treat that characterization as relevant to later attempts to shield the same data from oversight. The aim is not estoppel as a blunt instrument, but coherence as a governance constraint: actors should not receive the benefits of one characterization while escaping its costs.
Conflict-of-laws thinking also invites attention to institutional competence. Courts and agencies may be better suited to resolve certain questions than others. For example, privacy regulators may be better positioned to evaluate consumer harm and consent validity, while courts may be better positioned to adjudicate infringement claims. A law of collapse can structure coordination rather than forcing each forum to reinvent the full analysis.
Legal Pluralism and the Reality of Multiple Normative Orders
Legal pluralism begins with a simple observation: law is not the only normative order that shapes behavior. In the AI data ecosystem, platform rules, private contracts, industry norms, technical standards, and corporate governance practices often function as de facto law. Doctrinal collapse is, in part, what happens when private ordering exploits the seams of public ordering.
A pluralist response does not romanticize private ordering; it takes it seriously as an institutional fact. If licensing agreements and platform policies are doing governance work, a law of collapse should examine how those private instruments interact with public constraints. It should also recognize that private instruments may produce distributional harms and accountability gaps that public law must correct.
Pluralism also suggests that a single legal test may be insufficient. Managing collapse may require layered interventions: procedural transparency rules, substantive limits on consent overreach, and oversight mechanisms that do not depend entirely on adversarial discovery.
Institutional Responses: Building a Law of Collapse
Institutional responses to collapse should aim to reduce the profitability of strategic toggling, increase the visibility of training data practices, and prevent private ordering from bypassing individual and creator interests. The following responses illustrate what a law of collapse could include:
- Coherence constraints on characterization: Require parties to justify shifts between “public” and “private” characterizations of the same data across acquisition, liability, and oversight contexts, with doctrinal consequences for unexplained toggling.
- Procedural transparency pathways: Develop mechanisms for vetted, secure disclosure of training data lineage and composition that do not rely on full public release but also do not permit total secrecy.
- Limits on consent overbreadth: Treat excessively generalized “train on anything for any purpose” terms as presumptively insufficient for sensitive or unexpected downstream uses, especially where refusal is not realistically available.
- Supply-chain accountability mapping: Require documentation of data provenance and downstream transfer sufficient to attribute responsibility to specific actors rather than diffusing it across intermediaries.
- Inter-regime coordination norms: Encourage structured coordination between privacy regulators and copyright adjudication so that remedies and disclosures do not cancel each other out.
- Public oversight with protective design: Create oversight models that protect legitimate security and privacy concerns while still enabling evaluation of legality and harm, such as supervised audits or confidentiality-protected disclosures to regulators.
These responses do not require a single sweeping statute. They can be implemented through agency guidance, judicial management of discovery, targeted legislative amendments, and standard-setting in procurement and enterprise contracting. The shared objective is to make doctrine legible again by reducing the strategic advantage of crossing regimes opportunistically.
Preserving Space for Innovation Without Licensing a Power Grab
One reason doctrinal collapse is politically difficult to address is that AI innovation is real, and data access is central to it. A law of collapse should therefore avoid treating all large-scale data acquisition as illegitimate or all secrecy as suspicious. Instead, it should distinguish between legitimate needs—such as protecting trade secrets from competitors or protecting personal information from public exposure—and illegitimate uses of those needs as shields against accountability.
The aim is to preserve space for salutary innovation while preventing law from becoming a set of optional vocabularies for powerful actors. Innovation does not require the ability to argue “public” when taking and “private” when hiding. It requires clear rules, predictable constraints, and institutional processes that can evaluate contested practices without being structurally blinded.
Rule of Law in AI
Inter-regime doctrinal collapse is a warning sign. It indicates that the legal system’s existing categories—privacy, copyright, confidentiality—are being recombined in ways that undermine their constraining function. The collapse enables two dominant data acquisition tactics: buying through business-to-business deals that sidestep individual and creator interests, and asking through broad consumer contracts that exploit the weaknesses of notice-and-choice governance. It also enables a troubling form of strategic toggling: data is framed as public enough to scrape yet private enough to conceal.
Left unchecked, collapse favors incumbents, entrenches opacity, and impedes law’s ability to constrain arbitrary private power. Addressing it requires more than doctrinal refinement within privacy or copyright. It requires a law of collapse: institutional responses—drawn from conflict of laws and legal pluralism—that manage overlaps, impose coherence, and restore legibility to the governance of data-intensive AI.
If AI is built on data, then the rule of law in AI depends on whether law can still see that data, categorize it coherently, and constrain those who control it. Doctrinal collapse is what happens when law loses that capacity. A law of collapse is how law gets it back.