Auto-labeling & VQA Fine-tuning Pipeline for E2E Autonomous Driving

Auto-labeling lanes/trajectories/actors from surround-view + LiDAR + HD maps and generating VQA fine-tuning JSONL

Overview

During my research internship, I developed an auto-labeling tool that extracts lanes, trajectories, actors, and work-zones from surround-view cameras + LiDAR + HD maps, and converts them into VQA fine-tuning JSONL for training vision-language models. :contentReference[oaicite:8]{index=8}

What I Built

  • Auto-labeling tool (multi-modal)
    • Extracted structured supervision signals:
      • lanes / trajectories / actors / work-zones
    • Inputs:
      • surround-view images
      • LiDAR
      • HD maps
    • Outputs:
      • VQA fine-tuning dataset in JSONL format :contentReference[oaicite:9]{index=9}
  • VLM-based scene understanding & decision outputs
    • Used vision-fine-tuned InternVL 3.0 and GPT-4o/5 (API) to produce:
      • scene-understanding
      • action-decision style outputs
    • Target: improving E2E autonomous driving reasoning quality :contentReference[oaicite:10]{index=10}

My Role

  • Built the end-to-end labeling-to-dataset pipeline
  • Implemented dataset generation flow and VQA JSONL formatting
  • Integrated VLM inference outputs into training-ready data generation :contentReference[oaicite:11]{index=11}

Tech Stack

  • Sensors / Mapping: Surround-view cameras, LiDAR, HD maps
  • Dataset: VQA fine-tuning JSONL
  • Models: InternVL 3.0 (vision fine-tuned), GPT-4o/5 (API) :contentReference[oaicite:12]{index=12}

Media

Replace the images below with (1) dataset generation diagrams, (2) example labeled frames, (3) JSONL samples (blur sensitive info).

Recommended visuals: label overlays, track/trajectory extraction, and JSONL schema screenshots.

Key Takeaway

This project demonstrates my ability to:

  • build scalable data pipelines for autonomy,
  • bridge multi-modal perception into training data,
  • and connect VLM outputs to E2E driving model development. :contentReference[oaicite:13]{index=13}