Copyright risks in AI training: DeepSeek AI and beyond

As AI continues to evolve and new players emerge, the legal and ethical questions surrounding the training methods of AI models are gaining prominence. Recently, allegations have surfaced that newcomer DeepSeek AI may have used outputs from other AI models, such as OpenAI’s, to train its system. If legal action is taken, it could be the first case of its kind, as there has not, to our knowledge, been a precedent involving two AI companies in this context.

A key question arising from these allegations is whether AI-generated output is eligible for copyright protection. If so, does training another AI model using such AI-generated content constitute infringement?

There is no straightforward answer. One complication is that some countries, often at a policy level, are hesitant to grant copyright protection to AI-generated content. In jurisdictions where copyright requires human authorship, computer-generated works are generally not protected. The prevailing view in these regions is that the level of control exerted by the person prompting the AI is insufficient for copyright to apply. However, AI-assisted works are not automatically excluded from copyright protection. The US Copyright Office recently determined that an image created by an artist who selectively modified or regenerated parts of an AI-generated image through multiple prompts could be copyrighted.

In contrast, the UK recognises copyright in computer-generated works, but ongoing consultations are considering whether fully AI-generated content should continue to receive protection. Even in jurisdictions where such protection exists, AI model providers may not have a claim unless they own the rights to the output, whether through user license agreements, assignment, or other means.

Breach of licensing terms: A contractual battleground

Beyond the question of copyright protection, there is also the issue of whether using OpenAI’s ChatGPT to train another AI model breaches its user license agreements. Many AI companies, including OpenAI, impose terms of service that purport to restrict the use of their AI services for training competing models.

However, proving infringement or a breach of licensing terms is unlikely to be straightforward. On infringement, since AI models synthesise outputs from vast data sets, they are unlikely to generate identical reproductions of any single piece of source material. Additionally, a “child” model may be trained on the outputs of a “parent” model by repeatedly querying it and scraping the responses — creating a further degree of separation from the original source material. Unless an AI model consistently produces output identical to another AI model, it may be difficult to prove, based on output alone, that it was trained using the other model’s output.

Proving similarity is not likely to be sufficient. Unless unique quirks are embedded within an AI model’s results — quirks that can be detected in the outputs of a competing model — it may be arguable that AI models trained on similar data are likely to generate similar responses to a given prompt. Given the sheer scale of AI-generated content, a potential claimant faces an uphill task to prove substantial copying.

The risks of litigation and the likelihood of settlements

Given the complexity of proving AI copyright infringement, as well as the business risks associated with prolonged litigation — including negative publicity and regulatory scrutiny — many such disputes are likely to be resolved through confidential settlements rather than going to trial. However, any first-mover case in this area, where an AI company is accused of copying another AI company’s output, would set a legal precedent for future disputes. If the courts were to rule definitively on these issues, the decision could shape how AI companies approach training methodologies and copyright compliance going forward.

The allegations against DeepSeek AI highlight the evolving legal challenges surrounding AI training and copyright law, underscoring how the law is struggling to keep pace with rapid technological advancements in this area. Beyond copyright protection, contractual agreements and third-party claims add further layers of complexity to this issue.

As AI continues to reshape industries, legal and regulatory frameworks will need to evolve accordingly to balance innovation with intellectual property rights. The resolution of the DeepSeek AI vs OpenAI dispute and others like it will likely influence the future of AI development and intellectual property law in the years to come.