Task1: Pedestrian & Vehicle Detection

Part 1: Crowd Counting

This task is intended to evaluate the ability of algorithms to estimate the crowd density map in a complex scenario. For this task, participants will use our Gigapixel Video Dataset, a new resource with high spatial resolution and wide FOV simultaneously for computer vision challenges.

Dataset Download:

The Gigapixel Video Dataset 0.1alpha will be used for this task. This dataset consists of 65 representative images from the train station and the shanghai marathon sequences. These images are saved in JPEG format with more than 200K heads. We will release more labeled data in the future.

Invalid Area

Limited by the resolution, sometimes even human can not clearly count the exact number of people in some far places. Therefore, we have delineated some invalid areas which are considered artificially unrecognizable and have no groundtruth label.

Dataset Image Size Invalid Area
shanghai_marathon 26908 × 15024 1 ≤ x ≤ 26908, 1 ≤ y ≤ 6670
train_station 26558 × 14828 1 ≤ x ≤ 26558, 1 ≤ y ≤ 5130

Note: The top left pixel is set to the origin of the coordinates (x = 1, y = 1).


The groundtruth labels are saved in .mat and .txt files. The first two lines indicate the total number of people in the image. After that, each line represents a head position. The first number is the x coordinate; the second number is the y coordinate. The top left pixel is (1,1).

<x1 y1>
<x2 y2>
<xN yN>

Deep learning based computer vision algorithms have surpassed the human-level performance for many CV tasks, like object recognition and face verification. Object detection is a fundamental task for human-centric visual analysis. The extremely high resolution of PANDA makes it possible to detect objects from a long distance. However, the significant variance in scale, posture, and occlusion severely degrade the detection performance.

This task is designed to push the state-of-the-art in object detection on giga-pixel images forward. Teams are required to predict the bounding boxes of objects of pedestrians and vehicles with real-valued confidences. Some special regions (e.g., fake persons, extremely crowded regions, heavily occluded persons, etc.) are ignored in evaluation.

There will be 2 tracks: pedestrian detection and vehicle detection. For pedestrian detection track, There will be 3 sub-tasks: visible body, full body, and head detection for pedestrians.

The challenge is based on PANDA-Image dataset which contains 555 static giga-pixel images (390 for training, 165 for testing) captured by giga-pixel camera in different places at different height. We manually annotate the bounding boxes of different categories of objects in each image. Specifically, each person is annotated by 3 box, visible body box, full body box, and head box. It is worth mentioning that a target is skipped during evaluation if its IoU with special regions is larger than 0.5. All data and annotations on the training set are publicly available.

Challenge Guidelines

The object detection evaluation page lists detailed information regarding how submissions will be scored. To limit overfitting while providing researchers more flexibility to test their algorithms, we have divided the test set into two splits, including test-challenge and test-dev. Test-dev (60 images) is designed for debugging and validation experiments and allows for unlimited submission. The up-to-date results of the test-dev set are available to view on the leaderboard.

We encourage the participants to use the provided training data, while also allow them to use additional training data. The use of external data must be indicated during submission.

The train set images and corresponding annotations are available on the download page. The submitted results will be evaluated according to the rules described on the evaluation page. Please refer to the evaluation page for detailed explanation.

Tools and Instructions

We provide extensive API support for the PANDA images, annotation and evaluation code. Please visit our GitHub repository to download the PANDA API. For addition questions, please find the answers in FAQ or contact us.


When using our datasets in your research, please cite:

title={Multiscale gigapixel video: A cross resolution image matching and warping approach},
author={Yuan, Xiaoyun and Fang, Lu and Dai, Qionghai and Brady, David J and Liu, Yebin},
booktitle={Computational Photography (ICCP), 2017 IEEE International Conference on},


This dataset is for non-commercial use only. However, if you find yourself or your personal belongings in the data, please contact us, and we will immediately remove the respective images from our servers.