Summary
Deep Learning
2022
2021
2020
2019
2018
2017
2016: Survey
Earlier

Summary

(local method) 3d grid (TSDF, ESDF) + matching cube. (especially voxblox)
(global method) point cloud + possion reconstruction.
(currently used in pipeline) Delaunnay triangulation. (especially Robust and efficient surface reconstruction from range data)
Deep learning method.

Deep Learning

More Work are done with Deep Learning.

TUM AI Lecture Series - Image-based Rendering.

LLFF: Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines, github

1. Neural Rendering

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis 2020. MLP taking in a 5D coordinate and outputting density and color. Trainning a map : $F_{\Theta}(x, d) \to (x, \sigma)$ , from the pixel ray - defined by x (optical center), d (direction), to volumn density and color. Each pixel ray will be sampled to 'N_sample' points, each point run the network, then integrated to get the final value.

        graph LR
        A[Position of point] --> B[MLP encoder]
        B --> C[FCxN]
        C --> D[FC]
        B --> D
        D --> E[FCxN]
        E --> F
        X[Direction of ray] --> Y[MLP encoder] --> F[FC]
        F --> G[RGB & sigma]
        style A fill:#f9f,stroke:#333,stroke-width:4px
        style X fill:#f9f,stroke:#333,stroke-width:4px
        style G fill:#bbf,stroke:#333,stroke-width:4px

Need times to train for each data session.
Train LLFF dataset (“forward-facing” scenes) in “normalized device coordinates” (NDC) space; large rotation scene in conventional 3D world coordinates.
google jaxnerf implementation, see here with my tests.

NERF Extension:

Add Depth Loss: Depth-supervised NeRF: Fewer Views and Faster Training for Free 2021 with probabilisitic COLMAP depth supervision. github loss. (I made this update with NERF PL, no much improvement found)
Enable Localization: LENS: Localization enhanced by NeRF synthesis 2021 use Nerf in the Wild to perform data incrementation, for trainning a pose regressor.
NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections 2020 to address ubiquitous, real-world phenomena : moving objects or variable illumination.
- step 1. model per-image appearance variations in a learned low-dimensional latent space. -> control of the appearance of output.
- step 2. model the scene as the union of shared and image-dependent elements.
- see here for a wonderful implementation using pytorch-lightning, which also fits input from colmap. see here with my tests.

DIVeR: Real-time and Accurate Neural Radiance Fields with Deterministic Integration for Volume Rendering, DIVeR use a voxel-based representation to guide a deterministic volume rendering scheme, allowing it to render thin structures and other subtleties missed by traditional NeRF rendering. (Best Paper Finalist 2022).
Large Scene Block-NeRF Scalable Large Scene Neural View Synthesis, Waymo Google. scales NeRF to render city-scale scenes, decomposing the scene into individually trained NeRFs that are then combined to render the entire scene. Results are shown for 2.8M images.

NERF Acceleration:

Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields 2021, paper, github.
- Nerf : can cause excessive blurring and aliasing.
- Mip-NeRF: casting a cone from each pixel. integrated positional encoding (IPE) by each conical frustum (instead of position in Nerf).
Baking Neural Radiance Fields for Real-Time View Synthesis 2021, github. Sparse Neural Radiance Grid (SNeRG, sparse 3D voxel grid data structure storing a pre-trained NeRF model), accelerates rendering procedure.
KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs 2021. replaces a single large NeRF-MLP with thousands of tiny MLPs, accelerating rendering by 3 orders of magnitude.
(Voxel representation) Plenoxels: Radiance Fields without Neural Networks, github. foregoes MLPs altogether and optimizes opacity and view-dependent color (using spherical harmonics) directly on a 3D voxel grid.
- key features : Trilinear Interpolation, Total Variation Regularization.
(Mesh representation) MobileNeRF 2023: textured triangle mesh representation, can be rendered with the traditional polygon rasterization pipeline, which provides massive pixel-level parallelism. offers demo to run in phone.
- shader code.
- The current training is slow due to NeRF’s MLP backbone.

A generalization of the problem:

Instant Neural Graphics Primitives 2022 - An object represented by queries to a nerual network. git page. Train & render NeRF in realtime, and enable various of GUI to interact & visualize & edit.

Examples :
- GigaPixel Image : 2d position X (in image) -> RGB color.
- SDF : 3d position X -> distance to surface.
- Nerf : 3d position X + view direction d -> RGB color & density.
- Radiance Caching : 3d position X + Extra parameters -> RGB color global illumination.
Acceleration Design :
- Nerf render process: cut empty space, and cut ray after object.
- Smaller MLP: memery traffic dominate -> Fully Fused Neural Network : entire neural network implemented as single CUDA kernel.
- Input encoding (see understanding of input encoding): Multireslution hash encoding - pyramid structure for deep features (use hash to avoid dimensionality exponentially as in the hyper-cubical voxel case).
here for my test results, run with a outdoor general data session.

2. Neural Rendering

Pointcloud representation:

3D Gaussian Splatting for Real-Time Radiance Field Rendering 2023, uses 3d Gaussian (~pointcloud) as representation.
- Initialize with SFM sparse pcl.
- Properties to optimize: 3D position, opacity 𝛼, anisotropic covariance, and spherical harmonic (SH) coefficients following PlenOctrees to encode color. (see Spherical Harmonic Lighting: The Gritty Details to learn more about SH).
- Point-based 𝛼-blending enable fast rendering.
- It produces the best Nerf Results: test repo & result.

Gaussian Splatting Details

Anisotropic covariance: use scale vector and rotation to model (to ensure covariance being positive semi-definite).

Tile-based rasterizer for feat optimization, github code. Following previous work : Pulsar.

Other work : point-based neural rendering (who project neighbor views and use NN to optimize the fused MVS view).

D - Structural SIMilarity (SSIM) image loss (introduced in "Structural Similarity-Based Object Tracking in Video Sequences").

Use degree 3 SH, while PlenOctrees uses degree 2. and from the tests, SH requires lost of parameters, while produce little improvement.

Unity3D tool for gaussian splitting rendering., it also simplifies the SH to reduce memory consumption.

Web visualization.

4D Gaussian Splatting. Add MLP for each point along with time, to produce video. 3DGS is completely explicit, while this paper make it partly implicit, I don't like this degrade.
4D4K 2023. 4D grids representations. I think a 4D pcl represented GS will be simply better.
- initialized by space-caving for each timestamp from multi-view images.
- geometry: radius, density, position. (isotropic version of 3DGS)
- color: blending model + MLP SH. (Need MLP for each point.)
- render: (for each pixel) take k nearest points and merge by density. (No alpha blending.)
Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis, save point positions & rotations for each timestamp (while scale, and render parameters are shared) - rigid point motion.
- problems : need simultaneously multi-view collection, and save pcl for each timestamp is memory consuming.
三维高斯泼溅进展综述 CVMJ Spotlight 2024

2D Gaussian Splatting for Geometrically Accurate Radiance Fields. use 2d surflet rather than points. achieve better results, but doesn’t have mature ecosystem (UNITY, UE, WEBGL, etc).

City-Super work path:

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering 2024 using anchor point.
Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D Gaussians 2024 introducing Level-of-Detail (LOD) using octree.
HAC++: Towards 100X Compression of 3D Gaussian Splatting 2025. A great work of compression of GS in Scaffold format.

3. SDF

PermutoSDF: Fast Multi-View Reconstruction with Implicit Surfaces using Permutohedral Lattices 2023, github page, github. (1) use a hashed permutohedral encoding(following Instant NGP) to ensure fast training; (2) a novel RGB regularizer to encourage the network to predict high-frequency geometric detail.

NeuS2: Fast Learning of Neural Implicit Surfaces for Multi-view Reconstruction 2023, following work of Instant NGP.

multi-resolution hash tables of learnable feature vectors.
an incremental learning method for learning dynamic scenes.

Occupancy Network.

Improving neural implicit surfaces geometry with patch warping 2022, github.

VolSDF: Volume Rendering of Neural Implicit Surfaces 2021, github. define the volume density function as Laplace’s cumulative distribution function (CDF) applied to a signed distance function (SDF) representation. model the density:

\[\sigma(x) = \alpha \Phi_{\beta}(-d_{\Omega}(x))\] \[\begin{equation} \Phi_{\beta}(s) = \begin{cases} \frac{1}{2}exp(\frac{s}{\beta}) & \text{if $s \le 0$}\\ 1 - \frac{1}{2}exp(-\frac{s}{\beta}) & \text{if $s > 0$} \end{cases} \end{equation}\]

MLP1. sdf d and feature z: $f_{\phi}(x) = (d(x), z(x)) \in R^{1+256}$
MLP2. scene’s radiance field: $L_{\phi}(x, n, v, z) \in R^{3}$

Implicit Neural Representations with Periodic Activation Functions 2020. A continuous implicit neural representation using periodic activation functions that fits complicated signals. Solve challenging boundary value problems.

\[F(x, \Phi(x), \triangledown_{x}\Phi, \triangledown_{x}^{2}\Phi, ...) = 0\]

ReLU networks are piecewise linear incapable of modeling higher-order derivatives. While alternative activations are not well behaved.
SIREN: $\Phi(x) = W_{n}(\phi_{n-1} \circ \phi_{n-2} \circ … \circ \phi_{0})(x) + b_{n}$, $x_{i} \to \phi_{i}(x_{i}) = sin(W_{i}x_{i} + b_{i})$. The activations of Siren always alternate between a standard normal distribution with standard deviation one, and an arcsine distribution.
$\Phi(x)$ being a FC, loss be the $\int_{\Omega} \sum_{i}I_{\Omega_{i}}(x)|F(x)| dx$. ($\Omega_{i}$ is a sampling)
Poisson Equation, SDF(+-1), Helmholtz and Wave Equation. github.
Compared with NERF pose encoding in github.

DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation 2019 DeepSDF network outputs SDF value at a 3D query location. Shape completion (auto-decoding) takes considerably more time during inference. github.

4. Multi-View Geometry

Multi-View Stereo

PatchmatchNet: Learned Multi-View Patchmatch Stereo, github. checked in a few scenes, and run fusion the pointcloud, not ideal.

Mulit-View Interpolation

IBRNet: Learning Multi-View Image-Based Rendering 2021. Start from the target view and interpolate nearby source images (instead of encode the whole model - NERF) (similar to traditional MVS pipeline) : (1) select neighbor (source) images; (2) sample depths in each ray, project to the source images; (3) aggregate the source 2d features; (4) synthesis by a Ray Transformer. (But quality slightly worse than NERF).

Light Field Neural Rendering, google post. uses a lightfield parameterization for target pixel and its epipolar segments in nearby reference views.

2022

ACMMP : Multi-Scale Geometric Consistency Guided and Planar Prior Assisted Multi-View Stereo, following work of ACMM, using planar prior.

2021

Voxel Structure-based Mesh Reconstruction from a 3D Point Cloud, github code. It has a classification of meshing methods:

Approcimation-based method: Poisson, MLS(moving least squares), Scale Space. Rebuild the 2-manifold mesh to fit a point cloud directly. while may loss local details.
Delaunay-based method: point connection, require ideal point cloud distribution.
Point Resampling: resample a point cloud into an isotropic one (then we could apply Delaunay).
Pre- and Post-processing: mesh denoising, isotropic remesh and mesh repair.

This paper’s method contains the following steps:

pre-treatment : moving least squares smoothing point cloud. delaunay based interpolation to make isotropic point cloud.
make voxel structure: based on geodesic distance.
make mesh: resample points in each voxel (using Farthest Point Sampling, and apply dyanmic resample rate), then apply delaunay.

Efficiently Distributed Watertight Surface Reconstruction the distribution of all the steps (Delaunay + graph-cut).

Dense Surface Reconstruction from Monocular Vision and LiDAR LiDAR measurements are integrated into a multi-view stereo pipeline for point cloud densification and tetrahedralization. (the lidar mapping algorithm it used seems terrible, our algorithm is much much better)

2020

Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction replace traditional signed distance function with neural network.

Point2Mesh: A Self-Prior for Deformable Meshes using DL method (Neural Self-Priors) iteratively shrink-wrap the initial mesh, leading to a watertight reconstruction (fits the point cloud).

A 3D Surface Reconstruction Method for Large-Scale Point Cloud Data nothing new.

2019

Detail Preserved Surface Reconstruction from Point Cloud (noise image based point cloud) a new Visibility Model : $(1-e^{d^{2}/2 \sigma^{2}})$.

ACMM Multi-Scale Geometric Consistency Guided Multi-View Stereo, github.

multi-scalar process to handle low texture area.
Adaptive checkerboard propagation (~ more complicated DSO pattern).
used in our project, has good and fast result. but poor in close repeated pattern (close ground), (might be fixed by plane prior in their following work ACMMP).

2018

Reconstructing Thin Structures of Manifold Surfaces by Integrating Spatial Curves. use image based 3d curve reconstruction to enhance thin structures.

compute 3D curves based on the initialize-optimize-extend strategy.
Curve-conformed Delaunay Refinement to preserve thin structures: make sure Delaunay has kept all the segments of curves, and close region has finer triangles. Add sepcial energy to tetrahedra belonging to the same curve.

2017

Voxblox: Incremental 3D Euclidean Signed Distance Fields for On-Board MAV Planning, github code. state of art, TSDF, ESDF, and meshing. Extremely efficient!, wonderfully engineering art. (I had been using it for several years)

2016

A Survey of Surface Reconstruction from Point Clouds. The Role of Priors :

Surface Smoothness :

Local smoothness: finding zero set of local scalar field.
- MLS Plannar assumption : MLS 2003, IMLS 2004, RIMLS 2009.
- MLS Spherical approximations : fits a gradient field of the algebraic sphere s to the input (oriented) normals. APSS 2007: normal difference, Non-oriented MLS 2013 : dot product.
- Hierarchical methods : multi-level partition of unity 2003 ~ dynamic radius.
- Locally Optimal Projection (LOP): ~ evenly distributed resample.
Global smoothness:
- Radial basis functions (RBFs) 2001, Hermite RBF 2005 : a high degree of smoothness through a linear combination of radially symmetric basis functions.
- Indicator functions : estimating a soft labeling that discriminates the interior from the exterior of a solid shape. Poisson 2006, Screened Poisson 2013 (adds positional constraints), Voronoi-based (uses covariance matrices instead of normals to represent unsigned orientations).
- Volumetric segmentation: a hard labeling(interior or exterior) of a volumetric discretization. Spectral surface reconstruction from noisy point clouds 2004 (a graph Laplacian from the Delaunay triangulation). Poisson 2006.
Piecewise smoothness.
- Partitioning-based approaches : segmenting the input points with respect to sharp features (locally or globally).
- Normal-field based approaches : decoupling the computation of a sharp normal-field from the computation of the surface location. L1-Sparse reconstruction 2011 (L1 norm for normal field to keep sharp boundary). Edge-aware point set resampling (smooth normals while separating them across sharp features).
- Direct meshing. Feature-preserving surface recon- struction and simplification from defect-laden point sets. 2013
- Robust surface reconstruction via dictionary learning 2014 : each point in the input cloud is approximated by a single point on the output triangle.

Visibility:

Scanner Visibility (ray casting from scanner), TSDF A volumetric method for building complex models from range images 1996, Robust and efficient surface reconstruction from range data 2009 (uses Delaunay triangulation to formulate as a graph cut problem using line of sight information: labeling interior/exterior), TV-L1 range image integration 2007 (use L1 norm to merge range scan)
Exterior Visibility (explicit information from the scanner. e.g. camera view-point). Occlusion culling, Cone carving.
Parity (assuming a closed surface)

Volume Smoothness : enforce that the local shape thickness of a surface (i.e. a measurement of its local volume) varies smoothly to fill incomplete point cloud.

Skeletal regularizers. Curve skeleton extraction from incomplete point cloud 2009 introduces ROSA (rotational symmetry axis) : the medial axis of a shape approximated by curves.
Man-made skeletal geometry. Medial priors. Organic skeletal geometry : Tubular components (to fit in particular trees). Leaf venation patterns 2005, 3d tree 2010.

Geometric Primitives: (scene geometry may be explained by a compact set of simple geometric shapes), good for indoor and CAD models (both have samller range).

Detecting primitives. RANSAC shape detection 2007 : find planes, spheres, cylinders, cones, and torii though local method. Model globally, match locally: Efficient and robust 3d object recognition 2010.
Primitive consolidation : Surface reconstruction from fitted shape primitives 2008 plane primitives and align and merge the boundaries of adjacent primitives. Extension of this method: 2009, 2014. Augmenting primitive information.
Volumetric primitives.
Hybrid methods. Surface reconstruction through point set structuring 2013 : shape primitives are used to resample the point cloud and enforce structural constraints in the output, looks good to reconstruct buildings. Watertight scenes from urban lidar and planar surfaces 2013

Global Regularities : CAD models, man-made shapes and architectural shapes – possess a certain level of regularity. Structure-aware shape processing 2013.

Symmetry : find transformations, that map a subset of the shape onto itself. Discovering structural regularity in 3d geometry 2008, Symmetry factored embedding and distance 2010, Shape analysis with subspace symmetries 2011
Structural Repetition (facades): Non-local scan consolidation for 3d urban scenes 2010 (need human), Adaptive partitioning of urban facades 2011 (no human needed), 2d-3d fusion for layer decomposition of urban facades 2011 (associate RGB images) : looks good to reconstruct buildings.
Canonical Relationships (regularity in orientations) :
- Manhattan-world (MW). Automatic extraction of manhattan-world building masses from 3d laser range scans 2012 : classifying points by shape type – wall, edge, convex corner, or concave corner – and clustering points of a similar type.
- Consolidating relationships. Globfit 2011, Planar Shape Detection and Regularization in Tandem 2015(enforcing parallel and orthogonality constraints in the detection of planes), RAPter method 2015 (take a user-prescribed set of angles)
- Canonical building relationships. 2.5D scans 2012

Data-driven priors (semantic objects) : using a collection of known shapes to help perform reconstruction.

Scene reconstruction by rigid/non-rigid retrieval.
Object reconstruction by part composition.
Reconstruction in shape spaces.

User-Driven Methods : Topology cues, Structural repetition cues, Primitive relationship cues, Interleaved scanning and reconstruction.

Evaluation of Surface Reconstruction: Geometric Accuracy, Topological Accuracy, Structure Recovery, Reproducibility

Earlier

Superpixel meshes for fast edge-preserving surface reconstruction 2015 superpixels and second-order smoothness constraints. based on Single-view 3D mesh reconstruction: 2D base mesh extraction, Depth reconstruction, then point cloud and mesh.

Planar Shape Detection and Regularization in Tandem 2015 automated detection and regularization of primitive shapes from unorganized point clouds. And enforcing parallel and orthogonality constraints in the detection of planes. repeating the following:

uniformly distributed seeds, region grow, detect primitive shapes.
regularization and adjust coplanarity.

Let There Be Color! Large-Scale Texturing of 3D Reconstructions 2014, github code view selection then project to get texture. It performs well in our image mapping mesh result (using poisson). While it has high requirement on the mesh. Tested with some lidar mapping point cloud (made with TSDF + matching cube, without further de-noise), the result mesh is terrible. I presume it is caused by loss of accuracy in TSDF, and noise in lidar data.

Surface Reconstruction through Point Set Structuring 2013.

structuring and resampling the planar components into planar, crease(to connect adjacent primitives), corner and clutter.
reconstructing the surface from both the consolidated components and the unstructured points.
surface is obtained through solving a graph-cut problem formulated on the 3D Delaunay triangulation (see this part for more details). Following by a Surface quality refinement and simplification.

Watertight Scenes from Urban LiDAR and Planar Surfaces 2013

make into small region.
conforming constrained Delaunay tetrahedralization (CCDT) to partition 3-dimensional space into tetrahedral cells. (details to read)
minimum-weight graph-cut. see more

Real-time 3d reconstruction at scale using voxel hashing.

2.5D Building Modeling by Discovering Global Regularities 2012:three fundamental type of relationships in buildings (for reconstruction from Aerial imagery):

roof-roof relationships that consist of orientation and placement equalities.
roof-roof boundary relationships that consist of parallelism and orthogonality relationships.
boundary-boundary relationships that consist of height and position equality.

finding the relationships via clustering (i.e., clustering similar angles, equality, etc..), they are used to inform the primitive fitting method so that the primitives simultaneously fit to the data and to the relationships.

GlobFit: Consistently Fitting Primitives by Discovering Global Relations 2011 assuming man-made engineering object, RANSAC -> Find global relationship -> alignment (merge close elements in the orientation space). Starting from an initial set of detected primitives, parallel, orthogonal, angle-equality, and distance-equality relation-ships are individually detected and carefully selected so as to not cause any relationship conflicts.

Multi-view reconstruction preserving weakly-supported surfaces 2011. Some papers refer this as state-of-art.

baseline (the following paper) constant point weight.
point weight depends on the number of observations, make a filter strategy for the initial Delaunay. (closer to the colmap implementation)
a free-space-support weight function. compute all weights in the same way as the base-line approach. Then search for all large jumps and multiply the corresponding t-edge weights. (good for noisy data, need to test)

Robust and efficient surface reconstruction from range data 2009 formulates of the surface reconstruction problem as an energy minimisation problem that explicitly models the scanning process. Uses Delaunay triangulation to formulate as a graph cut problem using line of sight information: labeling interior/exterior. (colmap uses its implementation).

minimum cuts for optimal surface reconstruction :
- removing the edges connecting two sets of vertices, that is finding two disjoint sets S and T,
- with a cost: the sum of the capacities of the edges going from S to T.
- same as computing the maximum flow from the source s to the sink t.

\[c(S, T) = \sum_{v_{i} \in S \setminus \{s\} \\ v_{j} \in T \setminus \{t\}} w_{ij} + \sum_{v_{i} \in S \setminus \{s\}} t_{i} + \sum_{v_{i} \in T \setminus \{t\}} s_{i}\]

Surface visibility: soft visibility

Surface quality: the quality of surface triangle is evaluated as the ratio of the length of their longest edge over the length of their shortest edge (minus one). And Soft 3D beta–skeleton in graph-cut algorithm.

Efficient multi-view reconstruction of large-scale scenes using interest points, Delaunay triangulation and graph cuts 2007. First to consider surface visibility! the upper paper improved this method.

Poisson Surface Reconstruction 2006 State-of-art.

A mesh reconstruction algorithm driven by an intrinsic property of a point cloud 2004. It classfies meshing into the following methods :

sculpting-based approaches: Delaunay.
contour-tracing approaches: matching cube, Hhoppe’s.
region-growing approaches: (this paper) keep growing from inital triangulate.

Mesh Optimization 1993, paper. This a very important milestone paper for 3d reconstruction. Its first section - Mesh representation - worth carefully read. It treat the problem as optimization, using a two step greedy method to solve.

Energy function is : $E = E_{distance} + E_{representation} + E_{spring}$ (see more details in the paper)
step 1. keep mesh structure, optimize vertices’ positions to best fit points.
step 2. keep vertices’ poitions, update mesh structure by three types of updat : edge collapse, edge split or edge swap, to simplify the mesh.

Surface Reconstruction from Unorganized Points 1992. This a very important milestone paper for 3d reconstruction, you can found it is the basis of many modern methods. It finds a signed distance function, and use its tangent space to generate mesh.

using a local PCA to find local planes.
smoothen the planes (by the smoothness of the normals).
define signed distance by projecting points to local plane.
tracing its zero set, use a modified matching cube to generate mesh.

Delaunay triangulation 1934: maximize the minimum of all the angles of the triangles in the triangulation. The Delaunay triangulation of a discrete point set P in general position corresponds to the dual graph of the Voronoi diagram for P

Table of Contents