Does AI Copy?



Q. Do AI image creation tools copy existing images sourced from the internet?


Generative AI image creation tools like DALL·E, Midjourney and Stable Diffusion do not ‘copy’ images from the internet in the way that traditional copying or pasting works. Instead, these models are trained on large datasets of images, which are sourced online from publicly available imagery or licensed collections. Training involves the learning of patterns, styles, and relationships between visual elements, which are then used to generate new, original images based on user prompts.

However, the way these models are trained has raised ethical and legal debates, especially regarding copyright and data sourcing. Some concerns include whether artists’ works are used without consent and whether the AI can generate images that closely resemble copyrighted content. While these tools do not store or retrieve exact copies of training images, there have been cases where AI-generated outputs resemble existing works, particularly when highly specific prompts are used.


......................................................................................................

Q. How does AI generate new images based on learned concepts?


How Generative AI Image Models Work


1. Training on Large Datasets
AI models are trained on massive datasets containing millions (or even billions) of images and their associated text descriptions. The images come from publicly available sources such as websites, archives, collections and licensed datasets. The AI doesn’t store images but rather ‘learns’ statistical relationships between pixels, styles, and subjects.


2. Latent Space Representation
Instead of memorizing images, the model builds a ‘latent space’, a mathematical representation of features like color, shape, texture, and composition. This allows it to understand how different concepts (e.g., ‘a cat sitting on a chair’ relate to each other without directly recalling any specific image.


3. Diffusion Process (for Some Models Like Stable Diffusion)

Many modern generative models use a technique called diffusion:

  • The model starts with pure noise (random pixels).
  • It gradually refines the image by removing noise step by step based on the user’s prompt.
  • Over many steps, it forms a completely new image that fits the given description.

4. Text-to-Image Generation
When you input a prompt like ‘a cyberpunk cityscape at sunset’, the AI:

  • Interprets the words using its learned knowledge from training.
  • Synthesizes an image that aligns with what it has learned about cyberpunk aesthetics, cityscapes, and sunsets.
  • Produces an original image, not a copy of an existing one.

Why AI-Generated Images Aren't Exact Copies
AI models don't have a database of saved images to retrieve from. Instead, they generate new images based on learned characteristics. If the same prompt is given multiple times, different results will be generated - the AI isn’t recalling an exact image but creating a new one each time. However, in rare cases, if a dataset is biased toward certain highly recognizable styles or famous images, the AI might generate something similar to an existing piece.


......................................................................................................

Ethical & Legal Debates


1. Copyright Issues
If an artist’s work was included in the training data without permission, some argue that AI-generated images based on learned styles are a form of unauthorized derivative work. This has led to legal challenges, with artists and companies questioning whether AI models should be allowed to train on copyrighted material.


2. Mimicking Art Styles
AI can generate images in the style of well-known artists, raising concerns about intellectual property rights and fair use. Some artists believe this devalues their work, while AI developers argue that humans also learn by studying and replicating styles.


3. Data Transparency & Consent
Some AI models (like those from OpenAI) restrict training data sources to avoid potential copyright issues. Others, like open-source models, may use broader datasets that include copyrighted works, leading to legal uncertainties.


......................................................................................................

Summary
AI-generated images are not simple copies but rather novel creations based on learned patterns. However, the use of copyrighted works in training data remains a gray area, with ongoing legal and ethical discussions.



Storage



Q. How is AI training data stored or referenced?


The way AI models store and reference training data is fundamentally different from how traditional databases or image libraries work. Instead of keeping copies of images, the model learns patterns and relationships between visual elements through mathematical representations. Here's how it happens:


1. Training Data Encoding: No Direct Image Storage
AI models don't store actual images or pixel-by-pixel copies. Instead, they analyze images and text descriptions, extracting patterns such as shapes, textures, colors, and spatial relationships. This information is stored as a latent space representation, which is a compressed mathematical abstraction of the learned concepts.


2. Latent Space Representation: How AI 'Remembers'
Neural networks, particularly deep learning models, encode images into a high-dimensional space. In this space, similar concepts are grouped together. For example: all images of 'cats' might be located in one region while all images of 'sunsets' might be in another. The model can mix concepts, allowing prompts like ‘a cat at sunset’ to generate a realistic combination. These representations don’t contain full images but rather statistical patterns derived from training.


3. How the Model Generates New Images
When given a text prompt, the AI doesn’t retrieve an image - it constructs one from scratch based on its learned understanding. This process is entirely generative, meaning no pre-existing image is being pulled from a database.


4. Why AI Doesn't 'Recall' Specific Images
Since data is stored as mathematical relationships rather than raw images, the AI cannot perfectly reproduce any specific training image. However, if a dataset contains a very famous or frequently occurring image (e.g., the Mona Lisa), the AI might generate something similar - but still not a pixel-for-pixel copy. The more unique or specific an image is, the less likely the AI will generate anything close to it.


5. Ethical Concerns: Data Transparency & Storage
Some AI models (like OpenAI's DALL·E) are trained on curated datasets that avoid copyrighted material. Others (like some versions of Stable Diffusion) use broader datasets that may include copyrighted images, leading to legal debates. No AI model can 'recall' or retrieve specific images on demand - only generate new ones based on what it has learned statistically.


......................................................................................................

Summary
AI models do not store or retrieve training images in the traditional sense. They learn abstract patterns and generate new content based on those patterns. However, the use of copyrighted data in training remains controversial, leading to discussions about consent, ownership, and ethical AI practices.



Image Creation



Q. Is it possible to input your own imagery into AI and combine them?


Yes. When you upload two (or more) original images and use AI to blend/merge them, the model follows a process that involves feature extraction, latent space integration, and generative synthesis. Sounds technical but here's a breakdown:



1. Feature Extraction
The AI first analyzes the inputted images, identifying key features such as colors, textures, shapes, and compositions. These features are mapped into a latent space representation, where each image is encoded as a set of mathematical values rather than raw pixels.


2. Latent Space Integration
The model then finds a middle ground between the images within the latent space. The features are integrated by:
- Merging textures (e.g., a rough texture from one image blending with a smooth texture from the other).
- Morphing shapes (e.g., a cat’s face gradually shifting into a dog’s).
- Mixing colors (e.g., the blue sky in one image blending with the sunset hues of another).

The amount of blending can be adjusted, with some tools allowing you to control which image has more influence.


3. Image Generation (Synthesis)
Once the integration is complete, the AI synthesizes a new image that combines elements of the inputs. If using a diffusion-based model (like Stable Diffusion), it starts with noise and reconstructs an image step by step, incorporating the blended features. If using a GAN-based model (like StyleGAN), it generates a new image by combining learned attributes from each source.


4. Fine-Tuning & Style Transfer (Optional)
Some AI tools allow for additional tweaking, such as:
- Adjusting which elements are more dominant (e.g., keeping one image’s structure but another’s color scheme).
- Applying artistic styles from one image onto another.
- Generating multiple variations to choose from.


......................................................................................................

Examples of AI Blending Tools

- Midjourney ‘Blend’ Tool: Upload two images, and Midjourney finds a smooth way to merge their features.
- Stable Diffusion Image Mixing: Uses ControlNet or interpolation in latent space to merge images creatively.
- Photoshop AI ‘Generative Fill’ Allows AI-assisted blending between different elements.
- Runway ML & Artbreeder: Offers more manual control over how two images blend in AI-generated content.


......................................................................................................

Q. Can blending uploaded images include patterns extracted from other works?


Blending images using AI does not inherently introduce patterns from other works, but the way AI models are trained means that their generative process is influenced by the broader dataset they were trained on. Here’s how it works and where external patterns might come into play:



1. What Happens When AI Blends Images?
The AI extracts features (such as colors, textures, and shapes) from both uploaded images and maps them into a mathematical latent space. It then interpolates between the feature sets, generating a new image that merges elements from all the inputted imagery. The new image is based solely on the uploaded imagery unless additional AI model influences are applied (like style filters or prompts).


2. When Might External Patterns Appear?
Even though the AI is blending user-provided images, external influences can still play a role:

- Latent Space Knowledge from Training Data
AI models learn generalized styles, textures, and structures from their training data. If the AI was trained on a large dataset of art, photography, or digital design, it may use similar learned patterns to fill in gaps or enhance details. For example, if you blend a rough sketch with a photo, the AI may generate details that resemble known artistic textures, even if they weren’t present in the original images.

- Implicit Style Transfer
Some AI models inherently apply style adjustments even without explicit instructions. If a model has been optimized to produce photo-realistic results or artistic styles, it might unintentionally introduce elements that resemble images from the training data.

- AI Model Choice Matters
Some models, like DALL·E or Midjourney, have strong built-in artistic tendencies. Even when blending images, their outputs might look more ‘painterly' or ‘illustrative' because of how the AI was trained. More controlled tools, like Stable Diffusion with ControlNet, allow for blending without introducing external stylistic influences.



3. Can Blending Lead to Copyright Issues?
If you only use your original images, the AI should primarily generate results based on those. However, if the AI applies learned artistic styles from its training data, it could potentially incorporate elements that resemble copyrighted works. This is a key reason why artists and legal experts are debating AI training ethics - because even when generating ‘new’ content, the AI’s training dataset can influence the results.


4. How to Minimize External Influence When Blending
To ensure that a blended image is 100% derived only from inputted originals, here are some strategies:
- Use open-source models (like Stable Diffusion) where you control the dataset and model weights.
- Avoid using AI with built-in styles (Midjourney, for example, tends to add stylistic flourishes).
- Use AI models that allow direct feature integration rather than generating from latent space diffusion.
- Manually fine-tune the results using Photoshop or other tools to remove unexpected elements.


......................................................................................................

Summary
- Existing works can be input into AI to be merged or blended in different ways.
- AI does not overlay images like a basic Photoshop blend; it creates a new image by analyzing and merging features at a deep level.
- Image integration happens in a mathematical latent space, not at the pixel level.
- The results depend on the AI model used, the settings applied, and the complexity of the images.
- AI blending should primarily rely on the provided images, but because AI models are trained on large datasets, some patterns, textures, or stylistic influences from the training data might subtly appear.
- The extent to which external data influences blending depends on the specific AI tool used and how much freedom the model has to ‘fill in’ missing details.
- If strict originality is important, using AI tools with transparent training datasets or manual refinement is the safest approach.



Ethics



Q. Can use of generative AI be considered 'theft'??


It depends on the context and legal interpretation. The term ‘theft’ typically implies the unlawful taking of someone’s property without permission and with the intent to deprive them of it. When discussing generative AI, the situation is more complex because:

1. AI Training Data & Copyright Law
AI models are trained on massive datasets, often including copyrighted images, text, and other media. These models do not store or directly copy entire works but learn patterns, styles, and structures from the data. Some argue that using copyrighted material for training without consent is a violation of intellectual property rights, but current laws do not explicitly classify this as theft.

2. The Difference Between ‘Copying’ and ‘Learning’
Generative AI does not pull existing works and paste them into new content, it creates outputs based on learned representations. However, in some cases, AI-generated content can resemble specific copyrighted works, raising concerns about potential derivative work violations rather than outright theft.

3. Legal & Ethical Debates
Some artists and content creators believe that AI training on their work without permission is unethical and should be considered a form of exploitation, if not outright theft. Others argue that AI training is similar to how humans learn from existing art and create new work, making it more akin to influence rather than direct theft. Lawsuits (like the ones against OpenAI, Stability AI, and MidJourney) are testing whether AI training constitutes infringement, but the legal system has not definitively ruled it as theft.

Conclusion
‘Theft’ may not be the most accurate term. Legally? Current copyright laws do not define AI training as theft, but lawsuits are challenging this. Ethically? Some creators view it as an unfair use of their work without compensation, making the term ‘theft’ understandable but not legally precise. More accurate terms? ‘Unlicensed use', ‘copyright infringement’, or ’unauthorized training’ might be better descriptions depending on the case.




Resources



Responsible Ai UK
Responsible Ai UK will connect UK research into Responsible AI to leading research centres and institutions around the world. This will allow RAi UK to deliver world-leading best practices for how to design, evaluate, regulate, and operate AI-systems in ways that benefit people, society and the nation.

ARTificial Intelligence (ART-I)
Mainstream narratives in the media about AI tend to the extreme, whether dystopian or utopian. This project has two aims. First, to move beyond these extremist narratives by exploring and understanding the stories that artists themselves are telling about AI, and what values underpin these narratives. Second, to use these stories and values to create works of art that engage the general public in diverse reflections about AI and its impact, both on creative communities and artistic notions such as authenticity and creativity.

Dave McKean on the Impact of AI for Artists
This interview with Dave McKean is from 2022, following the publication of his book 'Prompt', an examination of image-making with AI in practice.