Detection Loss

This module contains the loss function for object detection. The loss function is computed in two steps:

  1. A match \(\mathbf{P}\) is determined between predicted and ground truth boxes. The match is computed by the matching_method module. See the Matching Strategies section for more details.

  2. The loss is calculated as a weighed sum of the classification and bounding box regression losses: \(\mathcal{L}_{\text{train}}(\hat{\mathbf{y}}_i,\mathbf{y}_j) = \mathcal{L}_{\text{classification}}(\hat{\mathbf{c}}_i,\mathbf{c}_j) + \mathcal{L}_{\text{localization}}(\hat{\mathbf{b}}_i,\mathbf{b}_j)\) between the \(N_p\) predictions \(\hat{\mathbf{y}}_i\) and \(N_g\) targets \(\mathbf{y}_j\). The particular training loss for the background ground truth includes only a classification term \(\mathcal{L}_{\text{train}}(\hat{\mathbf{y}}_i,\varnothing) = \mathcal{L}_{\text{classification}}(\hat{\mathbf{c}}_i,\varnothing)\).

\begin{align} loss = N_p \sum_{i=1}^{N_p} \sum_{j=1}^{N_g+1} \hat{P}_{ij} \mathcal{L}_{\text{train}}(\hat{\mathbf{y}}_i,\mathbf{y}_j) \end{align}
  • If reduction is set to 'mean', the localization loss is divided by the number of matched predictions \(N_{pos} = N_p \sum_{i=1}^{N_p} \sum_{j=1}^{N_g} \hat{P}_{ij}\) and the classification loss is divided by the total number of predictions \(N_{\text{tot}} = N_p\sum_{i=1}^{N_p} \sum_{j=1}^{N_g+1} \hat{P}_{ij} \approx N_p\).

  • If reduction is set to 'sum', the loss is not divided by any factor.

The loss is implemented as a PyTorch module, so it can be used as a loss function in a PyTorch training loop.

Class

class uotod.loss.DetectionLoss(cls_loss_module: MultipleObjectiveLoss | _Loss, loc_loss_module: MultipleObjectiveLoss | _Loss, matching_method: _Match, bg_class_position: str = 'first', use_hard_negative_mining: bool = False, neg_to_pos_ratio: float | None = None, size_average=None, reduce=None, reduction: str = 'mean')

Loss function for object detection.

Parameters:
  • cls_loss_module (_Loss) – Classification loss.

  • loc_loss_module (_Loss) – Localization loss.

  • matching_method (_Match) – Matching method used to compute the matching.

  • bg_class_position (str, optional) – Index of the background class. “first”, “last” or “none” (no background class).

  • use_hard_negative_mining (bool, optional) – Whether to use hard negative mining.

  • neg_to_pos_ratio (float, optional) – Ratio of negative to positive samples to use when using hard negative mining.

  • size_average (bool, optional) – Deprecated.

  • reduce (bool, optional) – Deprecated.

  • reduction (str, optional) – Type of reduction to apply to the final loss. “mean” or “sum”.

Returns:

loss

Return type:

Tensor (float)

Note

To use multiple localization or classification loss terms, use the class uotod.loss.MultipleObjectiveLoss.

Note

The classification loss is averaged over the number of positive and negative samples, weighted by the class weights. This follows the implementation of the cross-entropy loss with class weights in PyTorch.

Note that the number of negative samples is zero when using the focal loss, since there is no explicit background class.

When using Hard Negative Mining, the number of negative samples is not taken into account in the averaging, for consistency with the SSD implementation.

forward(input: Dict[str, Tensor] | List[Dict[str, Tensor]], target: Dict[str, Tensor] | List[Dict[str, Tensor]], anchors: Tensor | None = None) Tensor

Computes the matching between the predicted and target boxes, and the corresponding loss.

Parameters:
  • input (dictionary) –

    Input containing the predicted logits and boxes.

    ”pred_logits”: Tensor of shape (batch_size, num_pred, num_classes).

    ”pred_boxes”: Tensor of shape (batch_size, num_pred, 4), where the last dimension is (x1, y1, x2, y2).

  • target (dictionary) –

    Target containing the target classes, boxes and mask.

    ”labels”: Ground-truth labels. Tensor of shape (batch_size, num_targets).

    ”boxes”: Ground-truth bounding boxes. Tensor of shape (batch_size, num_targets, 4), where the last dimension is (x1, y1, x2, y2).

    ”mask”: Padding mask. Tensor of shape (batch_size, num_targets).

  • anchors (Tensor) – the anchors used to compute the predicted boxes, optional. (batch_size, num_pred, 4) or (num_pred, 4), where the last dimension is (x1, y1, x2, y2).

Returns:

loss

Return type:

Tensor (float)

Examples:

>>> import torch
>>> from uotod.loss import DetectionLoss, NegativeProbLoss, GIoULoss, IoULoss
>>> from uotod.match import Hungarian, UnbalancedSinkhorn
>>> from uotod.utils import box_cxcywh_to_xyxy
>>> # Define the matching method
>>> matching_method = Hungarian(
>>>     cls_match_module=NegativeProbLoss(reduction="none"),
>>>     loc_match_module=torch.nn.L1Loss(reduction="none")
>>> )
>>> # Define the loss function
>>> loss_fn = DetectionLoss(
>>>     cls_loss_module=torch.nn.CrossEntropyLoss(reduction="none"),
>>>     loc_loss_module=GIoULoss(reduction="none"),
>>>     matching_method=matching_method
>>> )
>>> # Define the input
>>> pred = {"pred_logits": torch.randn(2, 100, 21),
>>>         "pred_boxes": box_cxcywh_to_xyxy(torch.rand(2, 100, 4))}
>>> # Define the target
>>> target = {"labels": torch.randint(1, 21, (2, 10)),      # 0 is the background class
>>>           "boxes": box_cxcywh_to_xyxy(torch.rand(2, 10, 4)),                # (x1, y1, x2, y2)
>>>           "mask": torch.ones(2, 10, dtype=torch.bool)}  # Padding mask
>>> # Compute the loss
>>> loss_fn(pred, target)

>>> # With anchors
>>> anchors = box_cxcywh_to_xyxy(torch.rand(100, 4))
>>> # Define the matching method
>>> matching_method = Hungarian(
>>>     cls_match_module=None,                                # No classification cost for matching anchors
>>>     loc_match_module=IoULoss(reduction="none"),
>>>     is_anchor_based=True                                # Use anchor-based matching
>>> )
>>> # Define the loss function
>>> loss_fn = DetectionLoss(
>>>     cls_loss_module=torch.nn.CrossEntropyLoss(reduction="none"),
>>>     loc_loss_module=GIoULoss(reduction="none"),
>>>     matching_method=matching_method
>>> )
>>> # Compute the loss
>>> loss_fn(pred, target, anchors)

>>> # Using Unbalanced Optimal Transport and hard negative mining
>>> # Define the matching method
>>> matching_method = UnbalancedSinkhorn(
>>>     cls_match_module=None,
>>>     loc_match_module=IoULoss(reduction="none"),
>>>     reg_pred=1e+3,                                      # Regularization parameter for the predicted boxes
>>>     reg_target=1e-3,                                    # Regularization parameter for the target boxes
>>>     background_cost=0.5,                                # Threshold (IoU) for matching to background
>>>     is_anchor_based=True                                # Use anchor-based matching
>>> )
>>> # Define the loss function
>>> loss_fn = DetectionLoss(
>>>     cls_loss_module=torch.nn.CrossEntropyLoss(reduction="none"),
>>>     loc_loss_module=GIoULoss(reduction="none"),
>>>     matching_method=matching_method,
>>>     use_hard_negative_mining=True,                      # Use hard negative mining
>>>     neg_to_pos_ratio=3.                                 # Use 3 negative samples for 1 positive sample
>>> )
>>> # Compute the loss
>>> loss_fn(pred, target, anchors)
reduction: str