From: hu-po
Rectified flow is a generative model formulation designed to connect data and noise distributions in a straight line during the image generation process [00:08:22]. It is highlighted as a specific type of flow used in the latest version of Stable Diffusion due to its superior performance over other methods [00:34:35].
Principle and Advantages
Traditional diffusion models often follow a “curved path” when transforming noise into an image in a high-dimensional space [00:15:02]. This means the process involves multiple intermediate steps, where the generated image might temporarily deviate in semantic attributes before returning to the desired final state [00:16:06], [00:17:30]. Each step in this process requires an evaluation of the neural network, directly impacting sampling speed [00:14:17], [00:16:29].
Rectified flow aims to eliminate this curvature by taking a “straight path” from pure noise to the target data distribution [00:16:18], [00:35:56]. This straight path means the process can ideally be simulated with a single step, making it less prone to error accumulation and significantly improving sampling speed [00:14:10], [00:16:21]. It is also noted for being the simplest possible formulation mathematically for this process [00:36:40].
Experimental Validation and Optimal Sampling
In Stable Diffusion 3, various flow trajectories were compared, including EDM, Cosine, and LDM Linear [00:34:50], [00:35:03], [00:35:12]. Experiments demonstrated that rectified flow consistently outperformed these alternatives [00:35:54], [00:49:09], [00:51:17].
The performance of rectified flow is further optimized when combined with a specific time-step sampling strategy called logit normal sampling [00:40:40], [00:49:52]. Unlike common uniform sampling, which picks random time steps with equal probability between noise and data [00:39:00], logit normal sampling biases the training towards intermediate steps [00:41:57]. The intuition behind this is that these intermediate stages (often between 0.4 and 0.6) are where the model faces the most difficulty and opportunity for learning how to effectively remove or predict noise [00:39:47], [00:42:22], [00:44:31].
The combination of rectified flow with logit normal sampling proved to be the most efficient and robust approach among 24 tested combinations of sampler settings [00:44:02], [00:51:17]. This specific combination shows better results even when using fewer sampling steps during inference [00:51:20], [00:51:54].
Practical Implications
The explicit comparison and validation of different flow trajectories and sampling methods by Stability AI saves other researchers and developers significant computational resources, as they can directly implement the most efficient and effective combination without extensive independent experimentation [01:59:02], [01:59:22]. This contributes to the overall scaling and optimization in diffusion models [01:53:06].