We require the participator to submit the results as a single .zip file. Each .txt file in the .zip file contains the results of the corresponding image. Notably, the results of each image must be stored in the archive's root folder.
The results file for each task should be stored in the SAME format as the provided ground-truth file, i.e., the JSON text-file containing one object instance per line. If there exists no output detection result, please provide an empty file. We suggest the participator reviewing the ground truth format before proceeding. For different tasks, each line in the text-file contains different content. Each text file stores the detection results of the corresponding image, with each line containing an object instance in the image. The format of each line is as follows:
In the DETECTION result file, the identity of the target should be set to the constant -1.
In the GROUNDTRUTH file, the identity of the target is used to provide the temporal corresponding relation of the bounding boxes in different frames.
|2||[target_id]||The y coordinate of the top-left corner of the predicted object bounding box|
|3||[bbox_left]||The x coordinate of the top-left corner of the predicted bounding box|
|4||[bbox_top]||The y coordinate of the top-left corner of the predicted object bounding box|
|5||[bbox_width]||The width in pixels of the predicted object bounding box|
|6||[bbox_height]||The height in pixels of the predicted object bounding box|
The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing an object instance.
The score in the GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in evaluation, while 0 indicates the bounding box will be ignored.
|8||[object_category]||The object category indicates the type of annotated object|
|9||[truncation]||The score in the DETECTION file should be set to the constant -1. The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame|
|10||[occlusion]||The score in the DETECTION file should be set to the constant -1.The score in the GROUNDTRUTH file indicates the fraction of objects being occluded|
Given an input video sequence, multi-object tracking aims to recover the trajectories of objects in the video. An evaluated algorithm is required to recover the trajectories of objects in video sequences with/without taking the object detection results as input.
To evaluate the performance of multiple person tracking algorithms, we adopt the metrics of MOTChallenge , including MOTA, MOTP, IDF1, FAR, MT and Hz. Multiple Object Tracking Accuracy (MOTA) computes the accuracy considering three error sources: false positives, false negatives/missed targets and identity switches. Multiple Object Tracking Precision (MOTP) takes into account the misalignment between the groundtruth and the predicted bounding boxes. ID F1 score (IDF1) measures the ratio of correctly identified detections over the average number of ground-truth and computed detections. False alarm rate (FAR) measures the average number of false alarms per frame. The mostly tracked targets (MT) measures the ratio of ground-truth trajectories that are covered by a track hypothesis for at least 80\% of their respective life span. The Hz indicates the processing speed of the algorithm. For all evaluation metrics, except FAR, higher is better. The evaluation code for Task 2 is available on the PANDA github.
 Milan A, Leal-Taixé L, Reid I, et al. MOT16: A benchmark for multi-object tracking[J]. arXiv preprint arXiv:1603.00831, 2016.