Task1: Pedestrian & Vehicle Detection

Results Format

The format of the result file is the same as that of the COCO Challenge. We require the participator to submit the results as a single det_results.json file (save via gason in Matlab or json.dump in Python). This .json file should contain a list whose each element is a dictionary. Each dictionary contains information about a result box, whose format is as follows:

[{ "image_id": int, "category_id": int, "bbox": [bbox_left, bbox_top, bbox_width, bbox_height], "score": float }]

The meaning of each value is listed as follows:

Key Description
image_id The serial number of the image, which shall be consistent with the annotation file
category_id the type of detection result box. shall be consistent with the following table
bbox_left The x coordinate of the top-left corner of the predicted bounding box
bbox_top The y coordinate of the top-left corner of the predicted object bounding box
bbox_width The width in pixels of the predicted object bounding box
bbox_height The height in pixels of the predicted object bounding box
score The confidence of the predicted bounding box enclosing an object instance

Object category_id
person visible body 1
person full body 2
person head 3
vehicle visible part 4

Evaluation Metrics

We require each evaluated algorithm to output a list of detected bounding boxes with confidence scores for each test image in the predefined format. Please see the results format above for more detail. Similar to the evaluation protocol in COCO Challenge [1], we use AP, APIOU=0.50, APIOU=0.75, ARmax=10, ARmax=100, and ARmax=500 metrics to evaluate the results of detection algorithms. Unless otherwise specified, the AP and AR metrics are averaged over multiple intersection over union (IoU) values. Specifically, we use ten IoU thresholds of [0.50:0.05:0.95]. All metrics are computed allowing for at most 500 top-scoring detections per image (across all categories). These criteria penalize missing detection of objects as well as duplicate detections (two detection results for the same object instance). The AP metric is used as the primary metric for ranking the algorithms. The metrics are described in the following table.

The above metrics are calculated over object categories of interest. For comprehensive evaluation, we will report the performance of each object category. Some special regions (e.g., fake persons, extremely crowded regions, heavily occluded persons, etc.) are ignored in evaluation. Please also see the Download page for more details about annotation. The evaluation code for object detection in images is available on the PANDA-Toolkit.

Measure Perfect Description
AP 100% The average precision over all 10 IoU thresholds (i.e., [0.5:0.05:0.95]) of all object categories
APIOU=0.50 100% The average precision over all object categories when the IoU overlap with ground truth is larger than 0.50
APIOU=0.75 100% The average precision over all object categories when the IoU overlap with ground truth is larger than 0.75
ARmax=10 100% The maximum recall given 10 detections per image
ARmax=100 100% The maximum recall given 100 detections per image
ARmax=500 100% The maximum recall given 500 detections per image

Task2: Multi-Pedestrian Tracking

Results Format

Please submit your results as a single mot_results.zip file. The results for each sequence must be stored in a separate .txt file in the archive's root folder. The file name must be exactly like the sequence name (e.g. 01_University_Canteen.txt, case sensitive).

The format of the result file is the same as that of the MOTChallenge. The file format is a CSV text-file containing one object instance per line. Each line must contain 10 values:

<frame>, <id>, <bb_left>, <bb_top>, <bb_width>, <bb_height>, <conf>, <x>, <y>, <z>

For the ground truth, the conf value acts as a flag whether the entry is to be considered. A value of 0 means that this particular instance is ignored in the evaluation, while any other value can be used to mark it as active. For submitted results, all lines in the .txt file are considered. The world coordinates x,y,z are ignored for the this 2D challenge and can be filled with -1. However, each line is still required to contain 10 values.

All frame numbers, target IDs and bounding boxes are 1-based. Here is an example:

                               1, 3, 794.27, 247.59, 71.245, 174.88, -1, -1, -1, -1
                               1, 6, 1648.1, 119.61, 66.504, 163.24, -1, -1, -1, -1
                               1, 8, 875.49, 399.98, 95.303, 233.93, -1, -1, -1, -1
                               ...
                               

The meaning of each value is listed as follows:

Position Name Description
1 frame The identity of the frame
2 id The identity of the target which is used to provide the temporal corresponding relation of the bounding boxes in different frames
3 bb_left The x coordinate of the top-left corner of the predicted bounding box
4 bb_top The y coordinate of the top-left corner of the predicted object bounding box
5 bb_width The width in pixels of the predicted object bounding box
6 bb_height The height in pixels of the predicted object bounding box

Evaluation Metrics

To evaluate the performance of multiple pedestrian tracking algorithms, we adopt the metrics of MOTChallenge [2], including MOTA, MOTP, IDF1, FAR, MT and Hz. MOTA(Multiple Object Tracking Accuracy) computes the accuracy considering three error sources: false positives, false negatives/missed targets and identity switches. MOTP (Multiple Object Tracking Precision) takes into account the misalignment between the groundtruth and the predicted bounding boxes. IDF1 (ID F1 score) measures the ratio of correctly identified detections over the average number of ground-truth and computed detections. FAR (False alarm rate) measures the average number of false alarms per frame. MT (The mostly tracked targets) measures the ratio of ground-truth trajectories that are covered by a track hypothesis for at least 80% of their respective life span. The Hz indicates the processing speed of the algorithm. For all evaluation metrics, except FAR, higher is better. The evaluation code for Task 2 is available on the PANDA-Toolkit.

Tools and Instructions

We provide extensive toolkit support for the PANDA in which APIs for data visualization, split, merge, and result evaluation are provided. Please visit our GitHub repository page. For addition questions, please find the answers in FAQ or contact us.

Reference

[1] T. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick, “Microsoft COCO: common objects in context,” in Proceedings of European Conference on Computer Vision, 2014, pp. 740–755

[2] Milan A, Leal-Taixé L, Reid I, et al. MOT16: A benchmark for multi-object tracking[J]. arXiv preprint arXiv:1603.00831, 2016.

Top