Table of Contents

  1. End-to-end Regression
  2. DL depth + DL flow -> Pose
  3. Match + Relative Pose
  4. Overhead Image localization

1. End-to-end Regression

input image, directly return the pose (3dof/6dof).

Pose Regression:

Geo-Localization as Classification (GPS coordinate as label):

2. DL depth + DL flow -> Pose

why not train a network directly regress the relative pose?

DiffPoseNet: Direct Differentiable Camera Pose Estimation 2022, project page. (1) Get relative pose based on dense optical flow, and image depth. (2) NFlowNet and Coarse PoseNet together to get fine pose.

Towards Better Generalization: Joint Depth-Pose Learning without PoseNet 2020. DeepFlow -> compute fundamental matrix. -> sparse pcl -> Rescale DeepDepth result.

Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction 2018.

3. Match + (Relative) Pose

3.1 Dense Image 2d Matching

  • Map: images with poses.
  • Query Pipeline: Retrieval + Match Features + Relative Poses + Pose Averaging -> Query Camera Pose.

DKM: Dense Kernelized Feature Matching for Geometry Estimation 2023. directly output points matches with two input images.

LoFTR: Detector-Free Local Feature Matching with Transformers 2021. directly output points matches with two input images.

3.2 Image-Scene 2d-3d Matching

Scene Coordinate Regression:

Accelerated Coordinate Encoding 2023. From Niantic, small scene localization using DL method - fast and high accuracy. Fits well to Niantic’s LightShip (small region vlp around landmarks). Intuition is that each image patch corresponds to a 3d point in scene.

  • predict a 3d scene coordinate for each input image patch, then apply PNP ransac for pose estimation.
  • accelerate the trainning loop, compare to DSAC*.
  • For few-shot mapping: use the encoder to get descriptors for each image patch, following by NN match + Triangulation to get 3d position. Use NN to find corresponding 3d points by descriptors.. ace encoder test

Learning to Detect Scene Landmarks for Camera Localization 2022, github. predict 2d localization of a predefined scene landmark (predict heat map for each scene landmark).

Visual Camera Re-Localization from RGB and RGB-D Images Using DSAC 2021. predict scene coordinates, i.e. dense correspondences between the input image and 3D scene space of the environment.

4. Overhead Image localization

global image descriptor-based (limited by sampling density of satellite images, mostly need panorama image as input):

dense pixel-level feature-based (using the geometric relationship - as camera pose optimization):

Can we give an encoded id to each building in city, and use them to localize query image?