From: redpointai

Adobe places significant emphasis on trust and data security in AI, especially concerning the data used for training its generative models and the content generated by them [00:26:26]. This focus addresses regulatory and safety issues in AI model development and broader concerns and considerations for AI safety and regulation.

Data Sourcing and Privacy

A key part of Adobe’s Firefly strategy is training its models on data from its own Adobe Stock database of content [00:26:41]. This means Firefly models are not trained on data scraped from the internet [00:26:50]. This approach is intended to reduce concerns from artists about consent, control, or compensation for their work [00:26:52].

Training on Adobe Stock data provides several advantages:

  • High-quality content: Access to professionally curated content [00:27:01].
  • Clear usage rights: Adobe has the right to train on data stored in its marketplace [00:27:06].
  • Reduced IP infringement risk: Minimizes the potential for generated content to infringe on existing intellectual property [00:27:10].
  • Minimized harmful content: Adobe Stock has a content moderation process, including automated and manual curation, that rejects harmful content, thus reducing its presence in the training data [00:27:21].

Crucially, Adobe explicitly states that it does not train its models on customer data stored in Creative Cloud [00:33:30].

Bias Mitigation

Recognizing that every dataset has inherent bias, Adobe actively works to mitigate bias in its models [00:28:10].

  • Internal Testing: An internal test with tens of thousands of Adobe employees provided valuable feedback on areas where bias was evident in early versions of Firefly [00:28:29]. This feedback led to significant improvements in a short period [00:28:51].
  • Person Detector Model: Adobe developed a “person detector” model to identify references to people or jobs in prompts [00:29:19]. This model debiases the generated content to introduce a fair distribution of skin tones, genders, and age groups, often referencing the demographic distribution of the user’s country of origin [00:29:55].

Content Moderation and Safety

To ensure content is safe and commercially usable, Adobe has implemented several measures:

  • Toxicity Detectors: Various toxicity detector models have been trained to differentiate terms and prevent the generation of Not Safe For Work (NSFW) content [00:31:01].
  • Deny and Block Lists: These lists further restrict unwanted content generation [00:31:06].
  • NSFW Filters: Filters are applied at the end of the generation process [00:31:13].
  • Child Protection Systems: Specific systems detect prompts referencing children and minimize the chance of generating inappropriate content in relation to them [00:31:24].

Customer Feedback and Model Refinement

Adobe values customer feedback as a crucial component of its safety and refinement process [00:32:02].

  • Feedback Mechanisms: Firefly.com and Photoshop include mechanisms for beta customers to report what they like and dislike, including bias and harm issues [00:19:08], [00:32:03]. This feedback provides new training data points for rules and models [00:32:11].
  • Reinforcement Learning from Human Feedback (RLHF): On firefly.com, Adobe’s terms of use allow the storage of prompts and generated images for training [00:33:59]. This acts as Adobe’s way of doing RLHF [00:34:07]. Explicit signals (like/dislike, report) and implicit signals (download, save, share) are collected [00:34:18]. These signals are integrated into future Firefly models to teach the generation process to create content users prefer and avoid content they dislike [00:34:38].