Data privacy and ethical considerations in generative AI

From: redpointai

Adobe places significant emphasis on trust and data security in AI, especially concerning the data used for training its generative models and the content generated by them [00:26:26]. This focus addresses regulatory and safety issues in AI model development and broader concerns and considerations for AI safety and regulation.

Data Sourcing and Privacy

A key part of Adobe’s Firefly strategy is training its models on data from its own Adobe Stock database of content [00:26:41]. This means Firefly models are not trained on data scraped from the internet [00:26:50]. This approach is intended to reduce concerns from artists about consent, control, or compensation for their work [00:26:52].

Training on Adobe Stock data provides several advantages:

High-quality content: Access to professionally curated content [00:27:01].
Clear usage rights: Adobe has the right to train on data stored in its marketplace [00:27:06].
Reduced IP infringement risk: Minimizes the potential for generated content to infringe on existing intellectual property [00:27:10].
Minimized harmful content: Adobe Stock has a content moderation process, including automated and manual curation, that rejects harmful content, thus reducing its presence in the training data [00:27:21].

Crucially, Adobe explicitly states that it does not train its models on customer data stored in Creative Cloud [00:33:30].

Bias Mitigation

Recognizing that every dataset has inherent bias, Adobe actively works to mitigate bias in its models [00:28:10].

Internal Testing: An internal test with tens of thousands of Adobe employees provided valuable feedback on areas where bias was evident in early versions of Firefly [00:28:29]. This feedback led to significant improvements in a short period [00:28:51].
Person Detector Model: Adobe developed a “person detector” model to identify references to people or jobs in prompts [00:29:19]. This model debiases the generated content to introduce a fair distribution of skin tones, genders, and age groups, often referencing the demographic distribution of the user’s country of origin [00:29:55].

Content Moderation and Safety

To ensure content is safe and commercially usable, Adobe has implemented several measures:

Toxicity Detectors: Various toxicity detector models have been trained to differentiate terms and prevent the generation of Not Safe For Work (NSFW) content [00:31:01].
Deny and Block Lists: These lists further restrict unwanted content generation [00:31:06].
NSFW Filters: Filters are applied at the end of the generation process [00:31:13].
Child Protection Systems: Specific systems detect prompts referencing children and minimize the chance of generating inappropriate content in relation to them [00:31:24].

Adobe values customer feedback as a crucial component of its safety and refinement process [00:32:02].

Feedback Mechanisms: Firefly.com and Photoshop include mechanisms for beta customers to report what they like and dislike, including bias and harm issues [00:19:08], [00:32:03]. This feedback provides new training data points for rules and models [00:32:11].
Reinforcement Learning from Human Feedback (RLHF): On firefly.com, Adobe’s terms of use allow the storage of prompts and generated images for training [00:33:59]. This acts as Adobe’s way of doing RLHF [00:34:07]. Explicit signals (like/dislike, report) and implicit signals (download, save, share) are collected [00:34:18]. These signals are integrated into future Firefly models to teach the generation process to create content users prefer and avoid content they dislike [00:34:38].

Tubegraph

Explorer

Table of Contents

Data privacy and ethical considerations in generative AI

Data Sourcing and Privacy

Bias Mitigation

Content Moderation and Safety

Customer Feedback and Model Refinement

Graph View

Backlinks