In the rapidly evolving landscape of artificial intelligence (AI), the training data that fuels AI models is of paramount importance. However, there are questions as to whether existing copyright law will give adequate control to copyright holders over whether and how their data is used to train AI models. In this article, I outline a potential solution to this problem in the form of a new right for copyright holders, referred to here as a “training data” right.
AI models typically rely on vast amounts of data to learn and make accurate predictions. This data typically comes from a variety of sources, including text, images, and audio, often gathered from the internet. However, the use of such data raises significant legal and ethical issues, particularly when copyrighted material is used against the wishes of the copyright holder. Although case law is beginning to develop in this area1, there are early signs that existing copyright law will ultimately fall short in providing protection for copyright holders in this context. For example, copyright infringement currently requires copying of a “substantial part” of a work, and whilst copying may take place during training of an AI model, it seems unlikely that the AI model itself (typically a neural network that is, in essence, a mathematical construct) could be viewed as containing a “substantial part” of the material on which it has been trained. Similarly, there are jurisdictional issues associated with existing copyright law that could interfere with a copyright infringement claim when, for example, the AI model has been trained in a different jurisdiction from where the AI model is ultimately used.
It is possible to envisage a right, referred to here as a “training data” right, that would enable copyright holders to decide whether, when and how their data is used to train AI models.
Here is an outline of how a “training data” right might be formulated:
This new “training data” right would be intended to create a new layer of protection that would supplement, rather than replace, existing copyright law.
Whilst a new “training data” right as outlined above seems feasible, it is reasonable to ask whether protecting copyright holders in this way would be in the interest of society as a whole. For example, it could be argued that providing copyright holders with a new “training data” right could stymy the development of AI models, by making it much more difficult for AI model developers to provide their models with the large volumes training data they need to work well. But on the other hand, is it fair for AI developers to build an AI model for profit, without any of that profit going to the copyright holders whose material was used to build the AI model?
One approach that might provide a reasonable balance between the needs of copyright holders and AI model developers would be for the “training data” right to be implemented on an “opt-out” basis. That is, for AI model developers to be allowed to use copyrighted data for their model except where the copyright holder has taken some step to “opt-out” their data from being used in this way. This would then allow AI model developers to use the vast majority of available data, whilst giving copyright holders the ability to prevent their data from being used in AI models where they do not want this.
Clearly, there are logistical and technical issues associated with an “opt-out” approach:
Interestingly, it appears that the EU is moving towards the “opt-out” approach outlined above via its AI Act, the enforcement of which is due to commence from 2 August 20262. Since enforcement of the EU’s AI Act has not yet begun, it’s not clear that the EU has yet dealt with all of the logical and technical issues associated with such an approach, as noted above.
In contrast to the EU, it seems that Japan appears to be moving towards a much more permissive regime, whereby AI developers are free to use copyrighted material whether they have permission or not3.
Clearly, regulation in this area is at an early stage, so we will need to wait and see if a consistent approach is adopted between different jurisdictions.
1 Getty Images Inc. v Stability AI Ltd. [2023] EWHC 3090 (Ch), New York Times vs. OpenAI (US), Tremblay v. OpenAI, Inc. (US), Millette v. OpenAI (US)
James is a Partner and Patent Attorney at Mewburn Ellis. He has a wide range of experience in patent drafting and prosecution at both the European Patent Office (EPO) and UK Intellectual Property Office (UKIPO) across a variety of industry sectors. James has particular expertise in the patentability of software and business-related inventions in Europe.
Email: james.leach@mewburn.com
Our IP specialists work at all stage of the IP life cycle and provide strategic advice about patent, trade mark and registered designs, as well as any IP-related disputes and legal and commercial requirements.
Our peopleWe have an easily-accessible office in central London, as well as a number of regional offices throughout the UK and an office in Munich, Germany. We’d love to hear from you, so please get in touch.
Get in touch