Gaussian Surfels in Computer Vision

From: hu-po

Gaussian Surfels represent a novel point-based representation in computer vision, emerging as a potential contender in the realm of 3D representations [00:02:50]. This technique, released on April 27, 2024, by Chinese research groups and Style3D Research, aims to combine the advantages of flexible optimization from 3D Gaussian points with the surface alignment property of surfels [00:03:35].

What are Gaussian Surfels?

Gaussian Surfels are a set of unstructured Gaussian kernels, effectively a collection of 3D points [00:26:25]. They are created by setting the z-scale of 3D Gaussian points to zero, transforming the original 3D ellipsoid into a 2D ellipse [00:06:36] [00:21:42]. This design provides clear guidance to the optimizer by treating the local z-axis as the normal direction [00:06:43].

Each Gaussian surfel possesses several properties:

Position: The XYZ coordinates of the kernel’s center [00:26:44].
Orientation/Rotation: Represented by a quaternion (four numbers) [00:26:56].
Opacity: A single number indicating how transparent the surfel is [00:27:13].
Spherical Harmonic Coefficients: Used to encode view-dependent appearance [00:27:20].
Scaling Factors: Adjust the size of the 2D ellipse [00:27:27].
Covariance Matrix: Represents the shape and spread of the surfel [00:27:41].

The concept of “surfels” is not new, dating back to a 2000 paper that introduced surface elements as rendering primitives [00:23:09].

Advantages of Gaussian Surfels

Gaussian Surfels offer several key advantages:

Surface Alignment: By flattening the Gaussians, the local z-axis can be directly interpreted as the normal direction, significantly improving surface alignment [00:06:47] [00:24:45]. This allows for an additional learning signal to align surfels with the actual surface [00:25:51].
Optimization Stability: The clear guidance provided by the flattened design enhances optimization stability [00:06:48].
Training Speed: They are quite fast in terms of training, achieving high quality with a low chamfer distance [00:06:08] [00:12:30].
Random Initialization: Unlike many Gaussian Splatting papers that require initialization via structure from motion (SfM), Gaussian Surfels can be initialized with random positions and rotations due to the strength of their loss functions [00:54:20] [00:56:26]. This removes a significant pain point in the pipeline [00:56:52].
Open Surface Reconstruction: They do not assume closed surfaces, making them suitable for reconstructing open surfaces, unlike signed distance functions (SDFs) often used in implicit representations [01:06:44].
Noise-Free & Intricate Details: The method excels in reconstructing noise-free surfaces and capturing intricate details [01:07:50] [01:17:26].
Potential Efficiency: In theory, fewer surfels might be needed compared to Gaussian Splats to represent a surface, as they can cover it with a very thin layer without requiring thickness [01:29:17].

Comparisons to Other 3D Representations

Gaussian Splatting (3DGS)

3DGS represents appearance and geometry using explicit, topology-free Gaussian points [00:08:23]. It leverages GPU and CUDA-based rasterization for real-time rendering, making it faster than Nerfs [00:11:32] [00:16:21].

However, 3DGS presents challenges:

Non-zero Thickness: 3D Gaussian points resemble ellipsoids with non-zero thickness, hindering close alignment with actual surfaces. This results in “bubbly” or “fuzzy” textures [00:17:35] [01:33:42].
Ambiguity in Normal Direction: The normal direction for each 3D Gaussian is ambiguous and can change during optimization, lacking a consistent relationship with the surface [00:18:46].
Sharp Surface Edges: Alpha blending in 3DGS can introduce bias, making it difficult to model sharp surface edges due to Gaussian points extending beyond the surface [00:19:27].

Gaussian Surfels address these by being inherently flat, allowing a meaningful normal vector and leading to cleaner, crispier surfaces [01:33:35].

Nerfs (Neural Radiance Fields)

Nerfs are implicit representations, storing the weights of a neural network (multi-layer perceptron) that implicitly defines the scene’s appearance and geometry [00:09:24].

Nerfs have distinct characteristics:

Implicit Representation: Points are not explicitly stored; instead, a neural network is used to generate color and density [00:09:24].
Slow Rendering: They use ray-based point sampling and volume rendering, which is computationally expensive due to multiple inferences per ray for each pixel [00:11:40] [00:12:28].
Overly Smooth Surfaces: Nerfs, especially those using signed distance functions (SDFs), tend to produce overly smooth surfaces, sometimes losing intricate details or sharp angles [00:05:03] [01:17:09] [01:34:16].

Optimization and Losses

The optimization pipeline for Gaussian Surfels is complex, incorporating five different loss functions that contribute to the final gradients:

Photometric Loss (Lp): Combines an L1 term (80%) and a DSSIM term (20%) to minimize the visual difference between the rendered image and the input image [00:36:19] [00:37:20].
Normal Prior Loss (Ln): Utilizes normals derived from a pre-trained monocular normal estimator (e.g., from OmniData) to guide the surfel normals towards the actual surface normal [00:39:51] [00:40:04] [00:44:17]. An L1 loss is also introduced to normalize the gradient of the rendered normal, regularizing surface curvature [00:41:14] [00:41:40] [00:42:04].
Opacity Loss (Lo): Promotes non-transparent surfaces by encouraging each surfel’s opacity to be near zero or near one, preventing “see-through” points common in Gaussian Splats [00:45:38] [00:47:00].
Depth Normal Consistency Loss (Lc): Enforces consistency between the rendered depth and the rendered normal [00:39:14] [00:43:09]. This loss helps correct both depth and normal directions if they are inaccurate, leading to improved reconstruction quality [00:42:49].
Mask Loss (Lm): Used to enhance quality with foreground masks [00:07:00].

The combination of these losses, especially those leveraging normal information, allows Gaussian Surfels to achieve superior surface reconstruction [00:37:37] [00:44:12].

Volumetric Cutting

Similar to the densification and pruning strategies in Gaussian Splatting pipelines, Gaussian Surfels utilize “volumetric cutting” to remove erroneous outlier points [00:49:37]. This involves constructing a voxel grid and pruning voxels (and the 3D points within them) that have low accumulated opacity, indicating a large distance from the foreground or background [00:50:52].

Surface Reconstruction

Once the surfels are optimized, a surface mesh is extracted using a screen Poisson reconstruction method [00:07:09] [00:48:03]. This technique creates watertight surfaces from oriented point sets, which is ideal given the surfels’ explicit normal directions [00:48:15].

Challenges and Limitations

Despite their advantages, Gaussian Surfels face certain challenges:

Complexity: The pipeline is intricate, involving numerous losses with individual weights and custom learning rates for different parameters (position, radius, opacity, view-dependent colors) [00:35:12] [00:57:48]. Gradient scaling for the normal is also applied [00:58:38]. This makes the system potentially fragile due to many “magic numbers” [00:59:17].
Specular Reflections: The method struggles with accurate surface reconstruction in areas with strong specular reflections, as these can cause conflicts in photometric consistency across different views [01:12:51] [01:13:00].
Dynamic Scenes: While not explicitly stated as a limitation, the high opacity and perfect alignment requirements for surfels might make it harder to represent dynamic scenes compared to the more deformable, semi-transparent blobs of Gaussian Splats [01:03:30].

Applications and Future Prospects

Gaussian Surfels demonstrate superior performance in surface reconstruction compared to state-of-the-art neural volume rendering and point-based rendering methods [00:07:37] [00:13:31]. They have been benchmarked on datasets like DTU (laboratory-captured scenes) and Blended MVS (varying number of images), consistently showing strong geometric quality [01:04:56] [01:08:37].

Future work could explore incorporating more sophisticated rendering capabilities beyond spherical harmonics to better handle complex lighting effects like reflections and subsurface scattering, potentially by using dedicated neural networks for view-dependent color [01:18:11] [01:20:07] [01:21:26]. Additionally, extending Gaussian Surfels to dynamic scenes by adding time dependence to their properties remains an area for research [01:03:07]. The integration of physics properties, as seen in other works, could also enable their use in interactive applications like video games [01:16:11].

Tubegraph

Explorer

Table of Contents