From: redpointai
Adobe places significant emphasis on trust and data security in AI, especially concerning the data used for training its generative models and the content generated by them [00:26:26]. This focus addresses regulatory and safety issues in AI model development and broader concerns and considerations for AI safety and regulation.
Data Sourcing and Privacy
A key part of Adobe’s Firefly strategy is training its models on data from its own Adobe Stock database of content [00:26:41]. This means Firefly models are not trained on data scraped from the internet [00:26:50]. This approach is intended to reduce concerns from artists about consent, control, or compensation for their work [00:26:52].
Training on Adobe Stock data provides several advantages:
- High-quality content: Access to professionally curated content [00:27:01].
- Clear usage rights: Adobe has the right to train on data stored in its marketplace [00:27:06].
- Reduced IP infringement risk: Minimizes the potential for generated content to infringe on existing intellectual property [00:27:10].
- Minimized harmful content: Adobe Stock has a content moderation process, including automated and manual curation, that rejects harmful content, thus reducing its presence in the training data [00:27:21].
Crucially, Adobe explicitly states that it does not train its models on customer data stored in Creative Cloud [00:33:30].
Bias Mitigation
Recognizing that every dataset has inherent bias, Adobe actively works to mitigate bias in its models [00:28:10].
- Internal Testing: An internal test with tens of thousands of Adobe employees provided valuable feedback on areas where bias was evident in early versions of Firefly [00:28:29]. This feedback led to significant improvements in a short period [00:28:51].
- Person Detector Model: Adobe developed a “person detector” model to identify references to people or jobs in prompts [00:29:19]. This model debiases the generated content to introduce a fair distribution of skin tones, genders, and age groups, often referencing the demographic distribution of the user’s country of origin [00:29:55].
Content Moderation and Safety
To ensure content is safe and commercially usable, Adobe has implemented several measures:
- Toxicity Detectors: Various toxicity detector models have been trained to differentiate terms and prevent the generation of Not Safe For Work (NSFW) content [00:31:01].
- Deny and Block Lists: These lists further restrict unwanted content generation [00:31:06].
- NSFW Filters: Filters are applied at the end of the generation process [00:31:13].
- Child Protection Systems: Specific systems detect prompts referencing children and minimize the chance of generating inappropriate content in relation to them [00:31:24].
Customer Feedback and Model Refinement
Adobe values customer feedback as a crucial component of its safety and refinement process [00:32:02].
- Feedback Mechanisms: Firefly.com and Photoshop include mechanisms for beta customers to report what they like and dislike, including bias and harm issues [00:19:08], [00:32:03]. This feedback provides new training data points for rules and models [00:32:11].
- Reinforcement Learning from Human Feedback (RLHF): On firefly.com, Adobe’s terms of use allow the storage of prompts and generated images for training [00:33:59]. This acts as Adobe’s way of doing RLHF [00:34:07]. Explicit signals (like/dislike, report) and implicit signals (download, save, share) are collected [00:34:18]. These signals are integrated into future Firefly models to teach the generation process to create content users prefer and avoid content they dislike [00:34:38].