Table of Contents

  1. ICP Covariance
  2. Line Feature Mapping
  3. Omnidirectional Camera
  4. XR Hand
  5. Continuous-Time Batch Calibration
  6. Image-based Rendering - MPIs
  7. ICCV 23
  8. 3D Object Tracking

1. ICP Covariance

ICP error source:

  • wrong convergence (to local minimial), error of the initial pose estimation.
  • under-constrainted situation: the problem is indeterminted.
  • miss match.
  • sensor noise.

An accurate closed-form estimate of ICP’s covariance 2007. Use hessien matrix as the estimation of the covariance (but this method in some cases greatly over-estimates thte true covariance):

\[cov(\hat{x}) \approx 2\frac{residual}{K-3} [\frac{\partial^{2}}{\partial x^{2}}residual]^{-1}\]

This paper develop the following closed-form method :

\[cov(x) \approx [\frac{\partial^{2}}{\partial x^{2}}J]^{-1} [\frac{\partial^{2}}{\partial z\partial x}J]^{T} cov(z) [\frac{\partial^{2}}{\partial z\partial x}J] [\frac{\partial^{2}}{\partial x^{2}}J]^{-1}\]

A Closed-form Estimate of 3D ICP Covariance 2015. Based on the upper paper, and solve for point-to-point case.

On the Covariance of ICP-based Scan-matching Techniques 2016. Analysis the upper hessien based method. Find that the upper method fit for point-to-plane icp, but not for point-to-point icp.

A New Approach to 3D ICP Covariance Estimation 2019. Add an additional term for the covariance from the initial pose estimation.

2. Line Feature Mapping

3D Line Mapping Revisited 2023, github. my version with colmap interface. ETH, STATE-OF-ART. line mapping using sfm result (camera poses & world points).

  1. Line Proposal : line match -> point-guided line triangulation (to overcome degenerate cases).
  2. Proposal Scoring & Track Association.
  3. Joint Optimization.
  4. Test localization in our benchmark, no improvement seen (more details in my repo).

UV-SLAM: Unconstrained Line-based SLAM Using Vanishing Points for Structural Mapping 2021. using vanishing points for structural mapping, to avoid degeneracy in Plucker representation.

PL-SLAM: a Stereo SLAM System through the Combination of Points and Line Segments 2017. Using the orthonormal representation of lines, and 3d point representation of points, to process visual slam (basicly ORBSLAM2 structure). And the first paper to derivative the line jacobians with detail.

impact of landmark parameterization on monocular ekf-slam with points and lines 2010 Project lines into camera image space.

structure-from-motion using lines : representation triangulation and bundle adjustment 2005, based on Plucker representation of the line (by two points or two planes: the direction of the line, and the moment). The paper proposed a Orthonormal Representation of lines, takes only 4 dof (three from SO(3) and one from SO(2)), make it easier for optimization.

  • Used this factorization in our project, it performs well. But in actually localization applications, point feature is much more robust than this method.
  • this should fits better for traffic lanes mapping, with fixed poses.

3. Omnidirectional Camera

3.1 Calibration

Single View Point Omnidirectional Camera Calibration from Planar Grids 2007 (opencv fisheye model based on this paper).

A Multiple-Camera System Calibration Toolbox Using A Feature Descriptor-Based Calibration Pattern (opencv calibration based on this paper).

3.2 Anti-Aliasing

Anti-Aliasing is important when converting panorama images to pinhole images.

Anti-aliasing techniques comparison. Spatial anti-aliasing.

  • SSAA (Supersampling anti-aliasing). In the objective image, pick some pixels around, project back to the original image (panorama image for our case) to get colors, and averaging.
  • MSAA (Multisample anti-aliasing), boost over SSAA share the samples among different objective pixels.
  • Post-process anti-aliasing: FXAA, SMAA, CMAA, etc.
  • Signal processing approach: to greatly reduce frequencies above a certain limit, known as the Nyquist frequency.

3.3 Reconstruction

Egocentric Scene Reconstruction from an Omnidirectional Video, github. Fuse per-frame depth estimates into a novel spherical binoctree data structure that is specifically designed to tolerate spherical depth estimation errors.

4. XR Hand

4.1 Meta

META blogs 2019

  • Blob segmentation
    • Image pyramids to find blobs in different scale, not for all frames. to handle : separate merged blobs, detect faint blobs, center of a close blob.
    • in noisy scene : holiday lights and trees:
      • detects stationary 3D lights and reject them.
      • use CNN to validate blobs.
  • LED Matching.
    • “brute matching” check all the hypotheses. “proximity matching” with prior information of pose.
    • all the blobs in the four images will be collected to match.
    • develop fewer points (1 point, 2 points) match algorithms.
  • No more blogs released after Dec 2019, but more hand tracking updates are available.
  • My implementation:

4.2 Apple

Apple Vision Pro 2023

  • Design for spatial input 2023.
    • eye tracking -> target. tap finger -> select. flick finger -> scroll.
    • could process complete hand tracking in some cases.

4.3 PICO

PICO Centaur 光学追踪+裸手识别 2023; LED + AI HAND + IMU.

4.3 Infrared Papers

A comparative analysis of localization algorithms for visible light communication 2021.

Light-based indoor positioning systems: A review 2020

  • LEDs based method. Data packets are transmitted through the optical channel using a modulation method (e.g On-Off Keying - high frequency switching of the LEDs).
    • Multiplexing to distinguish different LEDs - Time/Frequency/Orthogonal Frequency/Wavelength.
    • Positioning : Proximity/Signal Strength/Angle of Arrival/Time of Arrival
  • IR
    • Oculus Rift DK2 2014: LEDs transmit their own IDs by on-off keying as a 10-bit data packet at 60Hz.
  • Coded marker-based optical positioning systems.

Low-cost vision-based 6-DOF MAV localization using IR beacons 2013. Enumerate all possible 2d-3d matches, filter by plane prior (order around the centroid is kept), then solve pose by PnP.

PS Move API: A Cross-Platform 6DoF Tracking Framework 2013, with a more detailed version Cross-Platform Tracking of a 6DoF Motion Controller 2012. developed for PS Move Motion Controller: single large LED blob tracking.

Kinectrack: Agile 6-DoF Tracking Using a Projected Dot Pattern 2012. plannar IR pattern: 4 points -> quads -> kites. Kites have a perspective-invariant signature, used to match and compute pose.

Affordable infrared-optical pose-tracking for virtual and augmented reality 2007. multi-view construction, then 3d model fit (maximum-clique search) to get pose.

4.4 Other Papers

Efficient 6-DoF Tracking of Handheld Objects from an Egocentric Viewpoint 2018. Image based 3d position & 6 dof pose.

  • data set for hand hold objects. the data set might be useful.
  • Model based on Single Shot Multibox Detector (SSD). Intuition : users’ hands and arms provide excellent context.

1 euro Filter: A Simple Speed-based Low-pass Filter for Noisy Input in Interactive Systems 2012, here for an implementation One Euro Filter. Lower jitter at low speed, lower lag at high speed.

\[\alpha = \frac{1}{1 + \frac{\tau}{T_{e}}}, \tau = \frac{1}{2\pi + f_{c}}, f_{c} = f_{c_min} + \beta \| \dot{\hat{X_{i}}} \|\] \[\hat{X_{i}} = (X_{i} + \frac{\tau}{T_{e}} \hat{X_{i - 1}}) \frac{1}{1 + \frac{\tau}{T_{e}}}\]

Monado’s hand tracking, stream app:

5. Continuous-Time Batch Calibration

Calibrating the Extrinsics of Multiple IMUs and of Individual Axes 2016. Add multiple IMUs based on previous works. Unified Temporal and Spatial Calibration for Multi-Sensor Systems 2013. Add timestamp parameter based on previous work. Continuous-Time Batch Estimation using Temporal Basis Functions 2012. My Notes.

Use a serial of bsplines to simulate the trajectory, since bspline is continous (if degree is high enough), the trajectory will be smooth, and could compute derivative w.r.t. time to get acceleration and angular velocity. forme the optimization problem with :

  • map point observations.
  • imu measurements : 2nd derivative of position, and 1st derivative of rotation.
  • control input constraints.

General Matrix Representations for B-Splines 1998. used in upper papers to generate bsplines.

6. Image-based Rendering - MPIs

Some References:

Implicit Representations (Light Field - Plenoptic Function) - using position & direction of each pixel (5-dim), to get its color, depth and other meta-information. My Neural Rendering Notes

Layered Representations:

Multi-Plane Images (MPIs):

Single-view view synthesis test with deepmirror office.
AdaMPI test with online image.

MPIs Final choice : Single-View View Synthesis in the Wild with Learned Adaptive Multiplane Images 2022, our version, (Single-view view synthesis with rgbd trained on COCO). Could run on VR & Phone.

  • Use rbgd as input, predict density 𝜎 for each plane instead of alpha 𝛼 .
  • Plane Adjustment Network. arranging each MPI plane at an appropriate (pre-defined) depth to represent the scene.
  • Radiance Prediction Network. predicts the color 𝑐 𝑖 and density 𝜎 𝑖 for each plane at 𝑑 𝑖 .
  • Train using single image : supervised by RGBD wrapping + Hole filling network.
  • TODO: supervision by youtube videos.
  • TODO: single view 3D gaussian splitting might help?.
  • Implementation (Phone version & Pico version) of a OpenGLES shared based MIP visualizer.

7. ICCV 23

ICCV’23 Robot Learning & SLAM Workshop

Marc Pollefeys: Visual Localization and Mapping From Classical to Modern SFM & Visual Localization. 3DV 2024.

Maurice Fallon: Robust Multi-Sensor SLAM with Learning and Sensor Fusion. 3 camera + lidar system.

  • Lidar-Visual Odometry:
  • InstaLoc 2023 through dense lidar semantic instances matching.
  • NavLive 2022
  • Lidar Vision NeRF.
    • Lidar-Camera Calibration - [Extrinsic Calibration of Camera to LIDAR using a Differentiable Checkerboard Model 2023].
    • [SiLVR : Scalable Lidar-Visual Reconstruction with Neural Radiance Fields 2023] nerf + lidar depth + lidar normal.
  • SLAM + LLMs : Language-EXtended Indoor SLAM (LEXIS) 2023 building semantically rich visual maps with LLMs, based on CLIP.

Luca Carlone: From SLAM to Spatial Perception. hierarchical representations, certifiable algorithms, and self-supervised learning.

Chen Wang: Imperative SLAM and PyPose Library for Robot Learning, Imperative SLAM 2023. Take back-end optimization as a supervision signal for the front-end. PyPose.

Andrew Davison: Distributed Estimation and Learning for Robotics, see here for related lecture.

  • Reason for the thoughts: (1) Hardware: map the algorithm blocks to hardware; (2) Multi-robot systems.
  • Gaussian Belief Propagation.
  • Robot Web.
    • Multi-robot localization using Gaussian Belief Propagation.
    • Multi-robot planning using Gaus sian Belief Propagation.

Daniel Cremers: From Monocular SLAM to 3D Dynamic Scene Understanding.

Tim Barfoot: Learning Perception Components for Long Term Path Following.

Shubham Tulsiani: Probabilistic Pose Prediction. Objective : 3D object reconstruction. Pose Estimation from few views. SFM (e.g. Colmap) not robust under sparse-views. Data-driven learning method.

  • Direct Pose Prediction (end-to-end) try : failed ! I think the problem might be with the pose representation, see Why NeRF work ?.
  • RelPose++ 2023. Probabilistic Pose Prediction: predict the distribution of poses though energy-based model.

Ayoung Kim: Advancing SLAM with Learning. (1) Lines. Line Descriptor: LineRT 2021; (2) DL + Graph SLAM. Object SLAM : 6dof object pose estimation; (3) Thermal cameras.

Michael Kaess: Learning for Sonar and Radar SLAM. Camera fails in under-water environments.

  • Sonar : projection without elevation. Acoustic SFM. Epipolar contour. Acoustic Bundle Adjustment.
    • Sonar Image Correspondence. DL method.
    • Imaging Sonar Dense Reconstruction.
  • Radar SLAM, provide Doppler velocity also.

8. 3D Object Tracking

8.1 Traditional Methods

Region-based method: region segmentation + optimization. Use color statistics to model the probability that a pixel belongs to the object or to the background. The object pose is then optimized to best explain the segmentation of the image.

Depth-based method: minimize the distance between the surface of a 3D model and measurements from a depth camera.

  • Pros & Cons:
    • Cons: Depth sensor is required.
  • (1) point-to-plane ICP based. (2) SDF based. (3) Particle filter, Gaussian filters.

Keypoint-based method image feature extraction and match.

  • Pros & Cons:
    • Cons: Need Texture. Heavy.
  • SIFT, BRISK, LIFT, SuperGlue, etc.

Edge-based method

Direct method

My implementation using direct method.

8.2 Deep Learning Methods

6DoF Pose Estimation.

With Tracking.