Model Collapse and Access to Uncontaminated Datasets – A Competition Law Perspective

8. May 2025

Tags: Agents, AI, AI contamination, AI Regulation, Antitrust, antitrust remedies, Article 101 TFEU, Article 102 TFEU, Article 6(11) DMA, competition barriers, competition law limitations, Contamination, content provenance, Data Access, Data Act, Data Governance Act, data labeling markets, data monopolies, data quality standards, data trustees, dataset retention, Dominance, dynamic data correction, Essential Facilities Doctrine, FRAND principles, gatekeeper designation, historical data integrity, joint dominance, legal interoperability, LLM, Market entry barriers, model collapse, platform regulation, pre-AI datasets, Regulatory, regulatory innovation, Search Engine, Slop, synthetic content

The article examines how European competition law, specifically the Essential Facilities Doctrine, could apply to so-called uncontaminated datasets in the field of AI. This conclusion is drawn by an internationally renowned team of researchers in a recent JOLT Digest contribution. Their insights warrant further reflection, and the article itself is worth reading. However, a brief summary and analysis of the central ideas follow.

What is Model Collapse?

The article begins with the assumption that early large language models (LLMs) were built on scraping a significant amount of the existing internet at that time. From there, they started offering various AI-powered services, and users created new internet content, often leveraging AI-based offerings. This new content is again scraped and integrated into LLMs. The article suggests that this leads to “contamination,” where the AI-generated data is distorted through processes like the exclusion of statistical minor quantities. As a result, the present internet, and any subsequent scraping, is influenced by previous AI errors.

A potential problem arises for new LLMs entering the market: they may face the risk of data contamination, meaning they would not be able to access the original, uncontaminated datasets. This creates a competitive disadvantage for newer models, which cannot draw from the pristine dataset from the pre-AI era. The article proposes that such newer models might experience a gradual decline in performance due to this lack of access.

Market Entry Barriers?

The time advantage held by established players, who have access to uncontaminated data, is further amplified by other competitive factors. Established companies may not only have original data but could also improve this data through human training, giving them an edge over newer providers who cannot replicate this human feedback. Consequently, users may struggle to differentiate between human-generated and AI-generated content, which could undermine the variety of available content. This, in turn, could lead to a collapse of meaningful value in the marketplace, as synthetic data would only produce more synthetic content, rather than useful information.

Thus, holders of pre-2022 datasets could be in a competitive position, potentially monopolizing the data market by offering “untainted” materials.

Potential Solutions

If data access becomes a competition issue, the competition law (antitrust law) could intervene, alongside regulatory measures. A well-known solution could be the application of the Essential Facilities Doctrine.

From an antitrust perspective, the article assumes that access to uncontaminated historical datasets is crucial for training new models. Control over this access could further entrench the competitive position of established players, possibly leading to a market controlled by only a few companies holding the original dataset.

This raises concerns about exclusivity agreements that might violate Article 101 TFEU, especially if they prevent licensing to third parties or restrict data collection. Antitrust concerns also arise in the context of mergers, where the access to crucial datasets must be carefully considered.

The article notes that the abuse of market dominance under Article 102 TFEU could also arise if a dominant company refuses access to crucial datasets, potentially isolating the market. However, the authors highlight the significant legal hurdles in proving such cases, including the complexities of establishing clear conditions for access.

Relevance of Existing Regulatory Tools

The authors point to the increasing use of FRAND principles in the context of data access, as reflected in Article 8 of the Data Act and in voluntary commitments related to standards. The concept of obligations for data holders is seen as an important aspect of the ongoing debate.

From a regulatory standpoint, one suggestion is to “freeze” the supposedly uncontaminated dataset, with the EU’s existing regulations on AI and data (e.g., the Data Governance Act) serving as a potential model. The authors speculate about imposing direct obligations on data holders under the AI Regulation. A new data space or the use of data trustees might be helpful in this context.

Which Companies Hold Market Power?

A critical point raised is the identification of which companies hold market power in relation to data access. It is unlikely that a single company controls all relevant data. Even the notion of joint market dominance through several companies seems improbable, given that competition within the sector remains robust.

The article considers whether search engine indexing might play a role in designating certain companies as gatekeepers, potentially triggering the application of Article 6(11) of the Digital Markets Act (DMA). However, this would only apply if the company requesting access is itself an online search engine, which not all AI services are.

Could Markets Self-Regulate?

The article concludes with an exploration of whether the market could self-regulate. It suggests that by assigning specific responsibilities to particular companies, a market for the provision of uncontaminated data could emerge. Furthermore, labeling such data as “uncontaminated” might help formalize access and incentivize the creation of new markets, although this raises the issue of qualitative censorship — who would decide what qualifies as uncontaminated data?

A dynamic market could also develop for data correction services, where existing datasets would be monitored and corrected in real-time. This might counteract the model collapse by enabling continuous improvements to the datasets used by AI systems.

How Long Will the 2022 Datasets Matter?

The article poses a critical question: how long will datasets from 2022 remain relevant? If we follow the authors’ reasoning, the original dataset from the pre-AI era would serve as a benchmark for data integrity for decades. However, newer AI services might have less interest in outdated data that no longer reflects the latest developments.

This also raises the possibility that established companies might be required to retain the 2022 dataset indefinitely to comply with competition and regulatory requirements. The feasibility of this retention obligation remains questionable.

What Was Ever Truly Uncontaminated?

Finally, the article raises two fundamental issues:

Competition Law’s Scope: Competition law primarily protects the competitive process itself, not the free flow of information on the internet. Antitrust intervention is only warranted if a competition problem arises. However, user demand for “uncontaminated” information is not necessarily a driving force. AI services might still function without it, which could present an argument for regulation.
Defining Uncontaminated Data: Who decides what qualifies as uncontaminated data? The notion that data from 2022 is uncontaminated is debatable, especially given the prevalence of misinformation in recent years. The assumption of a perfect, uncontaminated dataset is increasingly unrealistic.

Conclusion and Critique:

The article identifies several critical assumptions, including the belief that uncontaminated datasets ever existed or can be preserved. It also suggests that the very same technology that caused data contamination could correct it, thereby resolving potential competition issues through market-driven solutions. Furthermore, a competition law connection seems unlikely, given the absence of clear market dominance. While regulatory measures to protect informational freedom are sensible, the key points for regulation remain unclear.

tl;dr:

The concept of “model collapse” resulting from AI contamination of datasets is a significant competition issue.
Competition law (including EU antitrust provisions) and regulatory approaches could play a key role in ensuring fair access to historical datasets.
However, many of the assumptions about uncontaminated datasets and market dominance remain questionable.

For more information on how we can assist with data access requests and navigate these legal challenges, feel free to contact us.

About the author

Dr. Sebastian Louven

I have been an independent lawyer since 2016 and advise mainly on antitrust law and telecommunications law. Since 2022 I am a specialist lawyer for international business law.

Interoperability of Offline AI Dictation with the iOS Dictation Button

Lightweight applications based on Whisper — an automatic speech recognition system developed by OpenAI — have become increasingly available. Comparable solutions are emerging as well. Some operate entirely offline,…

10. April 2026

Personal Data for Meta AI Training — My contribution at DPOBlog on District Court of Amsterdam‘s ruling

I have contributed a new analysis to the DPOBlog on the recent judgment of the District Court of Amsterdam, which prohibits Meta from using personal data…

31. October 2025

Upfront Obligations under the Data Act – Who Must Act First?

Several commentaries on the Data Act have now been published. I myself have contributed to the upcoming Bomhard/Schmidt-Kessel commentary, where I analyse Articles 8, 9 and 12 of…

16. October 2025

Our Newsletter: Curated Insights at the Intersection of Markets and Regulation

Week after week, our field sees new developments — proposals, proceedings, discussions, decisions. Many of these are covered across various media. Our newsletter offers a curated selection of…

14. October 2025

The End of Bronner? The CJEU on Platform Access and the Essential Facilities Doctrine

The Court of Justice of the European Union (CJEU) recently ruled on Google’s denial of access in the Enel/Google case, concerning third-party app integration in…

9. October 2025

Interoperability as a Regulatory Obligation – The Role of Templates on Digital Platforms

In its some months ago judgment in Enel v. Google, the Court of Justice of the European Union (CJEU) held that Google is, in principle,…

1. October 2025

Meta, DMA, and AI: Cologne Court Rejects Injunction – What Now?

Some weeks ago, the Higher Regional Court of Cologne (OLG Köln) rejected an application for an interim injunction seeking to prohibit Meta from using Facebook…

25. September 2025

DMA Enforcement: First Fines Against Apple and Meta – What Comes Next for Private Enforcement?

The European Commission has issued its first decisions under the Digital Markets Act (DMA), imposing fines on Apple and Meta for non-compliance. You can read…

16. September 2025

Where to Litigate Under the DMA: Jurisdictional Considerations in Private Enforcement

A key question in the context of private enforcement under the Digital Markets Act (DMA) is: Where can I bring a claim? The first step is to…

10. September 2025

Article 39 DMA: Judicial Cooperation and Coherence in Private Enforcement

Article 39 of the Digital Markets Act (DMA) contains specific provisions on the cooperation between the European Commission and national courts in the context of…

2. September 2025

Bundeskartellamt issues preliminary assessment of Amazon’s price control mechanisms

Last week, the German Federal Cartel Office (Bundeskartellamt, BKartA) published a press release outlining its preliminary legal assessment of Amazon’s current price control practices. According to…

28. August 2025

CJEU Rules Against Google in Enel Case (C‑233/23): Denial of Interoperability May Constitute Abuse of Dominance

The Court of Justice of the European Union (CJEU) has handed down its judgment in the case Enel Italia v Google (C‑233/23) some weeks ago. The…

14. August 2025

Applicability of the Data Act to Electronic Communications Services

As of 12 September 2025, the Data Act will apply across the EU. As a regulation, it is directly applicable in all Member States and thus…

7. August 2025

Podcast Feature: Data Navigator – “Does Competition Law Undermine the Data Act?”

I was recently a guest on the Data Navigator podcast, which takes a closer look at the Data Act from a variety of competition law perspectives. The episode…

21. July 2025

DMA Private Enforcement: Procedural Observations from the OLG Cologne

A few weeks ago, the Higher Regional Court (OLG) of Cologne rejected an application for preliminary injunctive relief filed by the Consumer Protection Association of…

18. July 2025

The Next Browser War: What Does the DMA Say About It?

The Frankfurter Allgemeine Zeitung recently published a thought-provoking article on AI browsers and their growing competition with conventional web browsers. The piece speaks of an emerging…

17. July 2025

Google Resolves Antitrust Concerns Amicably with the German Federal Cartel Office

Google has amicably resolved competition concerns with the German Federal Cartel Office (Bundeskartellamt – BKartA). This emerges from a press release issued by the BKartA. According…

9. July 2025

French Competition Authority fines Apple €150 million for anti-competitive conduct related to App Tracking Transparency

In early March, I reported on the ongoing antitrust investigation by the French Competition Authority (Autorité de la Concurrence) into Apple’s App Tracking Transparency Framework (ATTF).…

8. July 2025

Named “One to Watch” in Antitrust and Competition Law by Handelsblatt

The German business daily Handelsblatt has just published its 2025 ranking of “Germany’s Best Lawyers.” I am honoured to be included in the category “Ones to…

8. July 2025

ECJ Strengthens Cartel Damage Claims at Parent Company’s Location for Antitrust Violations by Subsidiary

Joint and several liability claims for cartel damages can generally be brought at the location of a parent company, even if the antitrust-violating subsidiary is based…

29. April 2025

Apple’s iOS Tracking Protection Under Pressure: French Competition Authority Goes Further Than the BKartA

A few weeks ago, I reported (german version) on the Bundeskartellamt’s (BKartA) proceedings concerning Apple’s so-called App Tracking Transparency Framework (ATTF). Now, the French Competition Authority…

21. April 2025

Examination of Significant Competition Disruptions in Fuel Wholesale: BKartA Initiates First Proceeding Under Section 32f(3) GWB

The Bundeskartellamt (BKartA) has initiated its first proceeding under Section 32f(3) of the German Act against Restraints of Competition (GWB) to investigate significant competition disruptions…

16. April 2025

Challenging Unlawful State Aid: Options After the Approval Period Expires

The decision of the Federal Administrative Court (BVerwG) dated July 7, 2020 (Case No. 8 B 59.19) addresses the issue of challenging state aid granted after…

8. April 2025

UK Competition Authority finds no control of OpenAI by Microsoft

Several weeks ago, the British competition authority concluded its investigation into Microsoft’s potential control over OpenAI, determining that Microsoft does not exert decisive influence over…

1. April 2025

BGH confirms German FCO in finding Apple pursuant to Section 19a (1) GWB

The recent regulatory proceedings at the Federal Cartel Office (Bundeskartellamt, BKartA) under Section 19a of the German Act against Restraints of Competition (Gesetz gegen Wettbewerbsbeschränkungen,…

26. March 2025

Meta faces antitrust fine according to abusive practises

Recently, the Commission announced that it has imposed an antitrust fine of €797.72 million on the Meta Group. The fine was issued for a violation of…

18. November 2024

Sebastian Louven holds lecture assignment on European Telecommunication Law at FU Berlin

Once again, this summer term our Partner Dr. Sebastian Louven will hold his lecture on European Telecommunication Law at Freie Universität Berlin. The lecture is…

3. April 2023

Digital Markets Act – Private Enforcement

The Digital Markets Act contains regulations for a European approach to market regulation of digital platforms. First of all, this includes the identification as a relevant gatekeeper.…

14. March 2023

Digital Markets Act — Market Investigations

In addition to the prohibitions of the blacklist, the Digital Markets Act (DMA) also contains an independent instrument of market investigation. Three occasions for the…

6. March 2023

Digital Markets Act — In with regulation, out with regulation

The draft Digital Markets Act (DMA) is intended to enable sector-specific market regulation of so-called gatekeepers. Gatekeepers are companies that operate central platform services and…

2. March 2023

Digital Markets Act — Definitions and scope of protection

The DMA is aimed at digital platforms and is intended to subject them to a sector-specific market regulation regime. For this purpose, a blacklist of measures that…

1. March 2023

Brogsitter Defence Returns

Brogsitter Defence ReturnsSome time ago, the ECJ ruled in its Wikingerhof decision on international jurisdiction in antitrust actions if there is also a contractual relationship between…

28. February 2023

Contribution to the DAJV Transatlantic Legal Conference 2023

The German-American Lawyer’s Association has invited me to speak at their Transatlantic Legal Conference this year. The event will take place on 17 March 2023.…

23. February 2023

Digital Markets Act — What are virtual assistants?

For some weeks now, there has been considerable media interest in so-called intelligent chatbots. Whether these are really intelligent and whether the term “artificial intelligence”…

16. February 2023

Digital Markets Act: Prohibition of data aggregation

Among the strict prohibitions in the Digital Markets Act, Article 5(2) DMA, which regulates the aggregation of personal data by the gatekeeper, stands out. The…

15. February 2023

Digital Markets Act: Prohibition of Self-Preferencing in Ranking

The prohibition of most-favoured-nation practices under Article 5(3) of the DMA is a competitive practice that restricts the freedom of action of commercial customers outside the…

13. February 2023

Digital Markets Act: Prohibition of most favoured nation practices for gatekeepers

The Digital Markets Act (DMA) came into force a few months ago. It will bring numerous new obligations for so-called gatekeepers. Today, I would like to present…

10. February 2023

Digital Markets Act — FRAND access to ranking, request, click and view data

The Digital Markets Act has been in force for a few months. It is therefore worth taking a first look at individual provisions that will occupy us…

8. February 2023

Abuse of market power through “shadow websites”

Some time ago, already in 2021, there were reports about so-called shadow websites in connection with the delivery service platform Lieferando. According to the report,…

6. February 2023

Digital Markets Act — Effective data portability

The forthcoming Art. 6(9) DMA contains rules with a reference to data portability. Here is the full text of the provision for better comprehension: The gatekeeper…

3. February 2023

Opinion of Advocate General Rantos on the German meta-case

The Advocate General at the ECJ has no general objections to a competition authority incidentally examining provisions of the GDPR when applying the prohibition of abuse…

1. February 2023

11TH GWB AMENDMENT AND PRIVATE ENFORCEMENT OF THE DMA

Some time ago, the draft bill of the 11th GWB amendment was published. Today, however, the focus is on the proposal for private enforcement, namely…

31. January 2023

Guest lecture at the University of Bonn on data protection and antitrust law

Guest lecture at the University of Bonn on data protection and antitrust lawOn 24 January 2023, our partner Dr Sebastian Louven was invited to give…

30. January 2023

New partner: Dr Verena Louven

louven.legal has recently become a PartGmbB. Dr Verena Louven joined as a partner. She brings several years of legal experience in business and in particular complements the…

29. January 2023

What is the German antitrust case against Facebook about?

Finally, today the FCO concluded its proceedings against Facebook. According to this, the company is prohibited from certain data processing with regard to third party…

12. May 2019