Task1: Pedestrian & Vehicle Detection

Deep learning based computer vision algorithms have surpassed the human-level performance for many CV tasks, like object recognition and face verification. Object detection is a fundamental task for human-centric visual analysis. The extremely high resolution of PANDA makes it possible to detect objects from a long distance. However, the significant variance in scale, posture, and occlusion severely degrade the detection performance.

This task is designed to push the state-of-the-art in object detection on giga-pixel images forward. Teams are required to predict the bounding boxes of objects of pedestrians and vehicles with real-valued confidences.

Challenge participants are required to detect two types of targets, pedestrians and vehicles. For each pedestrian, three bounding-boxes should be submitted: visible body bbox, full body bbox, and head bbox. For each vehicle, a visible part bbox needs to be submitted. Some special regions (e.g., fake persons, extremely crowded regions, heavily occluded persons, etc.) are ignored in evaluation.

The challenge is based on PANDA-Image dataset which contains 555 static giga-pixel images (390 for training, 165 for testing) captured by giga-pixel camera in different places at different height. We manually annotate the bounding boxes of different categories of objects in each image. Specifically, each person is annotated by 3 box, visible body box, full body box, and head box. All data and annotations on the training set are publicly available. Please see the Download page for more details about annotation.

Task2: Multi-Pedestrian Tracking

Object tracking aims to associate objects at different spatial positions and temporal frames. The superior properties of PANDA make it naturally suitable for long-term multi-object tracking. Yet the complex scenes with crowded pedestrian impose various challenges as well.

Given an input video sequence, this task requires the participating algorithms to recover the trajectories of pedestrians in the video (submit bounding-boxes with track id).

The challenge is based on PANDA-Video dataset which contains 15 video sequences including 10 videos for training and 5 videos for testing. We manually annotate the bounding boxes of pedestrians in each video frame. In addition, we also provide two kinds of useful annotations, i.e., occlusion degree and face orientation of each person. Please see the Download page for more details about annotation.

Data and Annotations

For PANDA-Image and PANDA-Video, all data and annotations for training set are available on the Download page.

Tools and Instructions

We provide extensive toolkit support for the PANDA in which APIs for data visualization, split, merge, and result evaluation are provided. Please visit our GitHub repository page. The submitted results will be evaluated according to the rules described on the Evaluation page. For addition questions, please find the answers in FAQ or contact us.