• Participants are allowed to use any object detection approach such as YOLOv4 , Scaled YOLOv4 , DETR, etc. for this challenge.
  • ImageNet, COCO dataset, etc. pre-trained weights are allowed to be used. It is important to note that the participants cannot use any of the real world images for vehicle with orientation annotations.
  • Participants are allowed to use extra images/datasets only obtained from driving simulators such as CARLA, AirSim, GTA, etc.
  • The use of image augmentation techniques on synthetic images is permitted in all forms.
  • Participants are expected to adhere to the spirit of the competition and not take advantage by training the model with real-world images. Further, manual hand labeling of the test dataset for submission is strictly prohibited.
  • There is no limit to the number of members on a team.
  • Up to five submissions per day are allowed.
  • The rankings would be determined solely using the weighted mAP metric, without taking into consideration the quality of the technical reports.
  • We will invite the top 10 rankers to submit workshop papers.
  • Participants are required to submit the source code and final model in executable form before the deadline. Moreover, if any additional images are used to improve the results, they should also be submitted. The organizers will determine the final winners after verifying the reproducibility of the algorithm.

Submission Format

For each image in the test dataset, the participants are required to predict the coordinates of bounding boxes (xmin, ymin, xmax, ymax), confidence value (confidence) and vehicle labels (e.g., car_front, truck_side) for each vehicle present in the image.

The output should contain the following two columns:
Image ID: Name of the corresponding test image with .txt extension, e.g., 20220304125507.txt
Prediction String: The prediction string should contain vehicle label followed by confidence and four integers of the bounding box (confidence xmin ymin xmax ymax) separated by a space. For example, car_front 0.95 245 174 270 250 describes a vehicle car_front with 95% confidence, which has a bounding box of coordinates (xmin = 245, ymin = 174, xmax = 270, ymax = 250). We accept up to 50 predictions in an image.

A sample submission for two images containing two vehicles is shown below:
20220304125507.txt car_front 0.87 245 174 270 250 truck_side 0.78 345 275 375 351
20210606112501.txt car_back 0.90 347 276 373 353 truck_front 0.85 446 379 472 454

A sample submission file containing detection results for seven images can be found here: