Table of Contents

  1. HD-Map
  2. Learning to Drive

TUM AI Lecture Series

1. HD-Map

LiDAR2Map: In Defense of LiDAR-Based Semantic Map Construction Using Online Camera Distillation 2023 generate HD map with lidar and BEV images.

High-Definition Map Generation Technologies For Autonomous Driving 2022

  • Data collection methods.
  • Point cloud map generation methods. Better see Lidar mapping algorithm papers.
  • Feature extraction methods for HD maps.
    • Road Network Extraction:
      • 2D Aerial Images : segmentation-based, iterative graph growing, and graph-generation methods.
      • 3D Point Clouds (using segmentation).
      • Sensor Fusion Methods : use both pcls, (aerial/car) images, GPS trajectories.
    • Road Markings Extraction : 2D (aerial/car) images or 3D point clouds (bottom-up method and top-down method).
    • Pole-like Objects Extraction: usually based on segmentation and classification on MLS 3D point clouds
  • Framework for HD maps:
    • Lanelet2 : physical layer (points and lines), relational layer, and topological layer.
    • OpenDRIVE : reference line/road (various geometric primitives), lane, and features.
    • Apollo Maps : uses points. Road, Intersection, Traffic signal, Logical relationship & Others.

Localization using HD map:

Computing Systems for Autonomous Driving: State-of-the-Art and Challenges 2020. focus on hardware side.

Towards End-to-End Lane Detection: an Instance Segmentation Approach 2018, github lane segmentation.

Computer Recognition of Roads from Satellite Pictures 1976

2. Learning to Drive

Take advantages of Transformers.

  • Traditional CV missions (classification, segmentation, etc) are not fit for auto-drive mission.
  • Compared to ChatGPT, these models are small. No large model in general Computer Vision yet.
    • Or we might not be able to dig vision data from internet as NLP did - no easy ‘gt’ could be found.
    • The driving task is still too simple, does not require high level understanding. (we need a better task to dig visual based AI, text-image related tasks might be good)

Make Large Dataset from online videos: how to make large dataset:

  • video online: no calibration, vision only, on real scale.
  • slam mapped dataset (require online video mapping algorithm).

Planning-oriented Autonomous Driving 2023. Large model for auto-drive, an end-to-end paradigm unites modules in perception and prediction. Combine different models together, and jointly optimize them. Made a good starting point for further work.

PPGeo: Policy Pre-training for Autonomous Driving via Self-supervised Geometric Modeling 2023.

  • In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input.
  • In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only.
  • Decision Intelligence Platform for Autonomous Driving simulation.

ACO: Learning to Drive by Watching YouTube videos: Action-Conditioned Contrastive Policy Pretraining 2022. Use ‘pseudo label of action’ (made by a supervised - Inverse dynamics model) to make a model ‘learn the features that matter to the output action’, which could be further transformed to other tasks.

  • data set list, data set drive.
  • Train with : Instance Contrastive Pair (ICP) and Action Contrastive Pair (ACP).
  • Inverse dynamics : DL Dense Optical Flow RAFT.

TCP - Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline 2022. two branches for trajectory planning and direct control, respectively.

Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos 2022, openai page. Learn to act by watching Minecraft game videos. Fun!. gets pseudo action labels from a trained Inverse Dynamics Model.

Momentum Contrast for Unsupervised Visual Representation Learning 2020, github page. Contrastive learning creates supervisory labels via considering each image (instance) in the dataset forms a unique category and applies the learning objective of instance discrimination.