The format of the result file is the same as that of the COCO Challenge. We require the participator to submit the results as a single det_results.json file (save via gason in Matlab or json.dump in Python). This .json file should contain a list whose each element is a dictionary. Each dictionary contains information about a result box, whose format is as follows:
"bbox": [bbox_left, bbox_top, bbox_width, bbox_height],
The meaning of each value is listed as follows:
|image_id||The serial number of the image, which shall be consistent with the annotation file|
|category_id||the type of detection result box. shall be consistent with the following table|
|bbox_left||The x coordinate of the top-left corner of the predicted bounding box|
|bbox_top||The y coordinate of the top-left corner of the predicted object bounding box|
|bbox_width||The width in pixels of the predicted object bounding box|
|bbox_height||The height in pixels of the predicted object bounding box|
|score||The confidence of the predicted bounding box enclosing an object instance|
|person visible body||1|
|person full body||2|
|vehicle visible part||4|
We require each evaluated algorithm to output a list of detected bounding boxes with confidence scores for each test image in the predefined format. Please see the results format above for more detail. Similar to the evaluation protocol in COCO Challenge , we use
ARmax=500 metrics to evaluate the results of detection algorithms. Unless otherwise specified, the AP and AR metrics are averaged over multiple intersection over union (IoU) values. Specifically, we use ten IoU thresholds of
[0.50:0.05:0.95]. All metrics are computed allowing for at most 500 top-scoring detections per image (across all categories). These criteria penalize missing detection of objects as well as duplicate detections (two detection results for the same object instance). The AP metric is used as the primary metric for ranking the algorithms. The metrics are described in the following table.
The above metrics are calculated over object categories of interest. For comprehensive evaluation, we will report the performance of each object category. Some special regions (e.g., fake persons, extremely crowded regions, heavily occluded persons, etc.) are ignored in evaluation. Please also see the Download page for more details about annotation. The evaluation code for object detection in images is available on the PANDA-Toolkit.
|AP||100%||The average precision over all 10 IoU thresholds (i.e., [0.5:0.05:0.95]) of all object categories|
|APIOU=0.50||100%||The average precision over all object categories when the IoU overlap with ground truth is larger than 0.50|
|APIOU=0.75||100%||The average precision over all object categories when the IoU overlap with ground truth is larger than 0.75|
|ARmax=10||100%||The maximum recall given 10 detections per image|
|ARmax=100||100%||The maximum recall given 100 detections per image|
|ARmax=500||100%||The maximum recall given 500 detections per image|
Please submit your results as a single mot_results.zip file. The results for each sequence must be stored in a separate .txt file in the archive's root folder. The file name must be exactly like the sequence name (e.g. 01_University_Canteen.txt, case sensitive).
The format of the result file is the same as that of the MOTChallenge. The file format is a CSV text-file containing one object instance per line. Each line must contain 10 values:
<frame>, <id>, <bb_left>, <bb_top>, <bb_width>, <bb_height>, <conf>, <x>, <y>, <z>
For the ground truth,
conf value acts as a flag whether the entry is to be considered. A value
of 0 means that this particular instance is ignored in the evaluation, while
any other value can be used to mark it as active.
For submitted results, all lines in the .txt file are considered.
The world coordinates
are ignored for the this 2D challenge and can be filled with -1.
However, each line is still required to contain 10 values.
All frame numbers, target IDs and bounding boxes are 1-based. Here is an example:
1, 3, 794.27, 247.59, 71.245, 174.88, -1, -1, -1, -1 1, 6, 1648.1, 119.61, 66.504, 163.24, -1, -1, -1, -1 1, 8, 875.49, 399.98, 95.303, 233.93, -1, -1, -1, -1 ...
The meaning of each value is listed as follows:
|1||frame||The identity of the frame|
|2||id||The identity of the target which is used to provide the temporal corresponding relation of the bounding boxes in different frames|
|3||bb_left||The x coordinate of the top-left corner of the predicted bounding box|
|4||bb_top||The y coordinate of the top-left corner of the predicted object bounding box|
|5||bb_width||The width in pixels of the predicted object bounding box|
|6||bb_height||The height in pixels of the predicted object bounding box|
To evaluate the performance of multiple pedestrian tracking algorithms, we adopt the metrics of MOTChallenge , including
MOTA(Multiple Object Tracking Accuracy) computes the accuracy considering three error sources: false positives, false negatives/missed targets and identity switches.
MOTP (Multiple Object Tracking Precision) takes into account the misalignment between the groundtruth and the predicted bounding boxes.
IDF1 (ID F1 score) measures the ratio of correctly identified detections over the average number of ground-truth and computed detections.
FAR (False alarm rate) measures the average number of false alarms per frame.
MT (The mostly tracked targets) measures the ratio of ground-truth trajectories that are covered by a track hypothesis for at least 80% of their respective life span. The
Hz indicates the processing speed of the algorithm. For all evaluation metrics, except FAR, higher is better. The evaluation code for Task 2 is available on the PANDA-Toolkit.
We provide extensive toolkit support for the PANDA in which APIs for data visualization, split, merge, and result evaluation are provided. Please visit our GitHub repository page. For addition questions, please find the answers in FAQ or contact us.
 T. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick, “Microsoft COCO: common objects in context,” in Proceedings of European Conference on Computer Vision, 2014, pp. 740–755
 Milan A, Leal-Taixé L, Reid I, et al. MOT16: A benchmark for multi-object tracking[J]. arXiv preprint arXiv:1603.00831, 2016.