Trust and safety in AIgenerated content

From: redpointai

Trust and safety, including removing bias and minimizing the chance harmful content can be generated, are top priorities for Adobe, which maintains a very high bar in this area [00:26:19].

Training Data Strategy

A core part of Adobe’s differentiated Firefly strategy involves training models on data exclusively from their stock database, Adobe Stock [00:26:32], [00:26:41]. They explicitly avoid training Firefly models on data scraped from the internet [00:26:50]. This approach addresses concerns from artists regarding consent, control, or compensation for their work [00:26:52].

Training on Adobe Stock data provides several benefits:

High-quality content [00:27:01]
Legal right to train on the data stored in their marketplace [00:27:06]
Reduced risk of IP infringement [00:27:10], [00:27:50], [00:35:18]
Reduced risk of harmful content [00:27:13]. Harmful content is not approved in Adobe Stock, thanks to both automated and manual curation processes [00:27:21].

Addressing Bias

While Adobe Stock helps reduce harmful content, training data sets inherently contain some bias [00:28:07], [00:28:10]. Adobe conducts extensive internal testing with employees to identify areas where the models are not performing well due to bias [00:28:29].

Key measures taken to reduce bias:

Person Detector: A model that detects references to people, often through job titles (e.g., “lawyer”) [00:29:21], [00:29:55].
Debiasing Content: When a person or job is referenced, the system debiases the content to introduce the correct distribution of skin tones, genders, and age groups based on the country of origin of the request [00:30:02], [00:30:09]. This ensures a fair representation that counters the biases taught by the initial data set [00:30:18], [00:30:33].

Preventing Harmful Content

Adobe has invested significantly in mechanisms to prevent harmful content, especially for its Firefly models [00:31:35]. This includes:

Toxicity Detector Models: Trained to differentiate various terms and ensure content is safe for all age ranges, specifically avoiding “not safe for work” (NSFW) content [00:31:01], [00:31:05].
Deny Lists and Block Lists: Used to filter out undesirable content [00:31:06].
NSFW Filters: Applied at the end of the generation process [00:31:13].
Child-Specific Safeguards: Systems detect prompts referencing children and either avoid or use negative prompting to minimize the chances of generating inappropriate content in relation to children (e.g., children with tobacco) [00:31:21], [00:31:41].

Adobe recognizes that these systems are not perfect and has built robust feedback mechanisms within firefly.adobe.com and Photoshop [00:32:02].

Explicit Signals: Users can provide direct feedback through “like/dislike” or “report” functions [00:34:18]. This feedback provides new training data points for refining rules and models [00:32:11].
Implicit Signals: Actions like downloading, saving, or sharing generated content are also strong indicators of user preference [00:34:24].

This feedback loop is crucial for reinforcing desired outcomes and avoiding disliked generations, effectively serving as Adobe’s way of doing Reinforcement Learning from Human Feedback (RLHF) [00:34:05], [00:34:38]. This ensures that the generated content is usable commercially [00:32:41].

[!IMPORTANT] Adobe does not train models on customer data stored in Creative Cloud to avoid issues with brand guidelines, next-generation campaigns, or other sensitive information [00:33:30], [00:33:40].

Brand Guidelines and Compliance

For Enterprise customers, brand guidelines and compliance are paramount [00:24:47], [00:24:52]. Adobe is innovating to enable marketers and creative departments within enterprises to automatically create content that complies with company or campaign brands [00:25:08]. This helps solve the content velocity problem by delivering beautiful, compliant content out of the gate, reducing the need for extensive quality assurance [00:25:37].

Tubegraph

Explorer

Table of Contents

Trust and safety in AIgenerated content

Training Data Strategy

Addressing Bias

Preventing Harmful Content

Customer Feedback for Refinement

Brand Guidelines and Compliance

Graph View

Backlinks