Overview

Depending on the data type, the PANDA dataset is split into 2 sub-sets: PANDA-Image and PANDA-Video. Among them, PANDA-Image is composed of 555 static giga-pixel images (390 for training, 165 for testing), and PANDA-Video is composed of 15 giga-pixel video sequences (10 videos for training, 5 videos for testing). Please refer to our paper for detailed information of pictures and videos in each scene.

Since existing video compression formats such as H264 cannot handle the extremely high resolution of the PANDA dataset, the videos in panda-video are split into image frames (.jpg format ) for storage. Moreever, from the point of view of data storage and download, in order to make the total volume of data not too large, swe take time sampling on the videos and the frame rate of the videos is 2 FPS.

Before downloading the data, you need to read the copyright terms of the dataset, agree to our licence, and submit the application for using the dataset to us by email. Please refer to the Copyright & Download section below for details.

Copyright

The PANDA dataset is available for the academic purpose only. Any researcher who uses the PANDA dataset should obey the licence as below:

All of the PANDA Dataset (data, annotation and software) are copyright by Smart Imaging Laboratory, Tsinghua-Berkeley Shenzhen Institute and published under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License. This means that you must attribute the work in the manner specified by the authors, you may not use this work for commercial purposes and if you alter, transform, or build upon this work, you may distribute the resulting work only under the same license.

This dataset is for non-commercial use only. However, if you find yourself or your personal belongings in the data, please contact us, and we will immediately remove the respective images from our servers.

Download

At present, we provide the data of image and video sequence of PANDA data set, as well as the bounding box (and corresponding labels) annotation of the training set. Group and interaction annotations are being prepared, please stay tuned.

To download the PANDA dataset, please agree on the license and provide the below information via email. We will only take applications from organization email (please DO NOT use the emails from gmail/163/qq). Anyone who uses the PANDA dataset should obey the license and send us an email for registration.

Please use the following email template:

-----------------------------------------------------------
To: zhang-xy18@mails.tsinghua.edu.cn
Subject: Apply for Using PANDA Dataset

I am aware of PANDA Terms of Use and I confirm to comply with it.

Name:
Organization:
Email:
Tel:
-----------------------------------------------------------

Annotation Description

File structure

For PANDA-Image, the training image and test image are stored in two compression packages respectively. The directory after the decompression contains the folders of each scene named after the scene name, each folder contains the pictures belonging to each scene.

For PANDA-Video, each video sequence is stored in a separate compression package. The compressed folder is named after the scene name and contains the frame images of the video sequences.

Annotation Formats

PANDA-Image

The two files human_bbox_train.json and vehicle_bbox_train.json respectively contain the annotations of the pedestrians and vehicles in the images for training set. human_bbox_test.json and vehicle_bbox_test.json only contain image_filepath, image id and image size for testing set. Please note that for the results on the test set to submit, the image id should be the same as in the annotation file.

human_bbox_train.json

JSON{
        image_filepath : image_dict,
        ...
}
image_dict{
        "image id" : int,
        "image size" : image_size,
        "objects list" : [object_dict],
}
image_size{
        "height" : int,
        "width" : int,
}
If the object is a person:
object_dict{
        "category" : "person",
        "pose" : "standing" or "walking" or "sitting" or "riding" or "held" (a baby in the arms) or "unsure",
        "riding type" : "bicycle rider" or "motorcycle rider" "tricycle rider" or "null" (when "pose" is not "riding"),
        "age" : "adult" or "child" or "unsure",
        "rects" : rects,
}
rects{
        "head" : rect_dict,
        "visible body" : rect_dict,
        "full body" : rect_dict,
}
If this box is crowd/reflection/something like person/... and need to be ignore:
object_dict{
        "category" : "ignore" (someone who is heavily occluded) or "fake person" or "crowd" (extremely dense crowd),
        "rect" : rect_dict,
}
rect_dict{
        "tl" : {
                "x" : float,
                "y" : float,
        },
        "br" : {
                "x" : float,
                "y" : float,
        }
}
  • image_filepath is the relative path of the image
  • "category" is the key that determines whether the target box is a pedestrian or a special area that needs to be ignored. A pedestrian can only be "person"
  • "riding type" is not "null" only if "category" is "riding"
  • "x" and "y" are floating point numbers between 0 and 1, representing the ratio of the coordinates to the width and height of the image, respectively

vehicle_bbox_train.json

JSON{
        image_filepath : image_dict,
        ...
}
image_dict{
        "image id" : int,
        "image size" : image_size,
        "objects list" : [object_dict],
}
image_size{
        "height" : int,
        "width" : int,
}
object_dict{
        "category" : "small car" or "midsize car" or "large car" or "bicycle" or "motorcycle" or "tricycle" or "electric car" or "baby carriage" or "vehicles" or "unsure",
        "rect" : rect_dict,
}
rect_dict{
        "tl" : {
                "x" : float,
                "y" : float,
        },
        "br" : {
                "x" : float,
                "y" : float,
        }
}
  • image_filepath is the relative path of the image
  • "vehicles"refers to a dense vehicle group and should be ignored
  • "small car", "midsize car" and "large car" belong to motor vehicles with four or more wheels and are distinguished by vehicle size. "electric car" refers to an electric sightseeing car or patrol car, etc.
  • "x" and "y" are floating point numbers between 0 and 1, representing the ratio of the coordinates to the width and height of the image, respectively

PANDA-Video

The annotation files for each video sequence in PANDA-Video include two: tracks.json and seqinfo.json respectively contain the pedestrian trajectory annotation and the basic information of the video sequence. The annotation file for each video sequence is stored in a folder named after the scene name.

tracks.json

JSON{
        [track_dict],
}
track_dict{
        "track id" : int,
        "frames" : [frame_dict],
}
frame_dict{
        "frame id" : int,
        "rect" : rect_dict,
        "face orientation" : "back" or "front" or "left" or "left back" or "left front" or "right" or "right back" or "right front" or "unsure",
        "occlusion" : "normal" or "hide" or "serious hide" or "disappear",
}
rect_dict{
        "tl" : {
                "x" : float,
                "y" : float,
        },
        "br" : {
                "x" : float,
                "y" : float,
        }
}
  • Both "frame id" and "track id" count from 1
  • In "face orientation", "front" means facing the camera
  • In "occlusion", "normal" means the occlusion rate is less than 10%, "hide" means the occlusion rate is between 10% and 50%, "serious hide" means the occlusion rate is greater than 50%, and "disappear" means the object completely disappears
  • "x" and "y" are floating point numbers between 0 and 1, representing the ratio of the coordinates to the width and height of the image, respectively

seqinfo.json

JSON{
        "name" : scene_name,
        "frameRate" : int,
        "seqLength" : int,
        "imWidth" : int,
        "imHeight" : int,
        "imExt" : file_extension,
        "imUrls" : [image_url]
}

Citation

All technical papers, documents and reports which use the PANDA dataset will acknowledge the use of the database and a citation to:

@inproceedings{wang2020panda, title={PANDA: A Gigapixel-level Human-centric Video Dataset}, author={Wang, Xueyang and Zhang, Xiya and Zhu, Yinheng and Guo, Yuchen and Yuan, Xiaoyun and Xiang, Liuyu and Wang, Zerun and Ding, Guiguang and Brady, David J and Dai, Qionghai and Fang, Lu}, booktitle={Computer Vision and Pattern Recognition (CVPR), 2020 IEEE International Conference on}, year={2020}, organization={IEEE} }

Top