Hungarian

We consider the matching cost \(\mathcal{L}_{\text{match}}\) = cls_match_module + loc_match_module between the \(N_p\) predictions \(\hat{\mathbf{y}}_i\) and \(N_t\) targets \(\mathbf{y}_j\). In particular, the cost of the background \(\mathbf{y}_{N_t+1} = \varnothing\) is given by \(\mathcal{L}_{\text{match}}\left(\hat{\mathbf{y}}_i, \varnothing\right)\) = bg_cost.

The Hungarian matching [Kuh55, Mun57] minimizes the cost of the pairwise matches between the \(N_t\) ground truth objects \(\left\{\mathbf{y}_j\right\}_{i=1}\) with the \(N_p\) predictions:

\[\hat{\sigma} = \mathrm{arg\, min} \left\{\sum_{j=1}^{N_t} \mathcal{L}_{\text{match}}\left(\hat{\mathbf{y}}_{\sigma(j)}, \mathbf{y}_{j} \right): \sigma \in \mathcal{P}_{N_t}(\left[\!\left[ N_p \right]\!\right])\right\},\]

where \(\mathcal{P}_{N_t}(\left[\!\left[ N_p\right]\!\right]) = \big\{\sigma \in \mathcal{P}(\left[\!\left[ N_p \right]\!\right]) : |\sigma| = N_t \big\}\) is the set of possible combinations of \(N_t\) in \(N_p\), with \(\mathcal{P}(\left[\!\left[ N_p \right]\!\right])\) the power set of \(\left[\!\left[ N_p\right]\!\right]\) (the set of all subsets). The match in matrix form \(\mathbf{P}\) itself is given by \(\hat{\sigma}(j) = \left\{ i : P_{i,j} \neq 0 \right\}\), or equivalently \(\hat{\sigma}(j) = \left\{ i : P_{i,j} = 1/N_p \right\}\). The unmatched predictions are matched to the background if applicable.

As an example, this matching strategy is used in the DETR family [CMS+20, ZSL+21]. The matching is one-to-one, minimizing the total cost. Every ground truth object is matched to a unique prediction, thus reducing the number of predictions needed. A downside is that the one-to-one matches may vary from one epoch to the next, again slowing down convergence [DPDPS+23, LZL+22].

Note

Due its sequential nature, this matching strategy cannot make good use parallelization. In its SciPy implementation (default), it only runs on the CPU, moving back and forth the data if necessary. If regularization is not an issue (it often even helps with the convergence), we encourage to use uotod.match.BalancedSinkhorn instead.

Class

class uotod.match.Hungarian(**kwargs)

Parameters:

scipy (bool, optional) – Uses SciPy’s implementation if True (default). Otherwise, uses the implementation of this repo, following from [Ber92].
cls_match_module (_Loss) – Classification loss used to compute the matching, if any.
loc_match_module (_Loss) – Localization loss used to compute the matching, if any.
background (bool, optional) – Indicates whether there is a background. Defaults to True.
background_cost (float, optional) – Cost of the background class. Defaults to 10.
is_anchor_based (bool, optional) – If True, the matching is performed between the anchor boxes and the target boxes.

compute_cost_matrix(input: Dict[str, Tensor] | List[Dict[str, Tensor]], target: Dict[str, Tensor] | List[Dict[str, Tensor]], anchors: Tensor | None = None) → Tensor

Computes a batch of cost matrices between the predicted and target boxes.

Parameters:

input (dictionary) – Input containing the predicted logits and boxes. “pred_logits”: Tensor of shape (batch_size, num_pred, num_classes). “pred_boxes”: Tensor of shape (batch_size, num_pred, 4), where the last dimension is (x1, y1, x2, y2).
target (dictionary) – Target containing the target classes, boxes and mask. “labels”: Tensor of shape (batch_size, num_targets). “boxes”: Tensor of shape (batch_size, num_targets, 4), where the last dimension is (x1, y1, x2, y2). “mask”: Tensor of shape (batch_size, num_targets).
anchors (Tensor) – the anchors used to compute the predicted boxes. (batch_size, num_pred, 4), where the last dimension is (x1, y1, x2, y2).
background (bool, optional) – Indicated whether the background has to be added.

Returns:

the matching between the predicted and target boxes: Tensor of shape (batch_size, num_pred, num_targets + 1) or (batch_size, num_pred, num_targets) if background is False.

Return type:

Tensor (float)

compute_matching(cost_matrix: Tensor, target_mask: Tensor | None) → Tensor

Computes the matching.

Parameters:

cost_matrix (Tensor) – Cost matrix of shape (batch_size, num_pred, num_targets + 1).
target_mask (BoolTensor, optional) – Target mask of shape (batch_size, num_targets).

Returns:

The matching \(\mathbf{P}\) for each element of the batch. Tensor of shape (batch_size, num_pred, num_targets + 1). The last entry of the last dimension [:, :, num_target+1] is the background.

Computes a batch of matchings between the predicted and target boxes.

Parameters:

input (dictionary) – Input containing the predicted logits and boxes. “pred_logits”: Tensor of shape (batch_size, num_pred, num_classes). “pred_boxes”: Tensor of shape (batch_size, num_pred, 4), where the last dimension is (x1, y1, x2, y2).
target (dictionary) – Target containing the target classes, boxes and mask. “labels”: Tensor of shape (batch_size, num_targets). “boxes”: Tensor of shape (batch_size, num_targets, 4), where the last dimension is (x1, y1, x2, y2). “mask”: Tensor of shape (batch_size, num_targets).
anchors (Tensor) – the anchors used to compute the predicted boxes. (batch_size, num_pred, 4), where the last dimension is (x1, y1, x2, y2).

Returns:

the matching between the predicted and target boxes, and the cost matrix if returns_cost is True: Tensor of shape (batch_size, num_pred, num_targets + 1). The last entry of the last dimension is the background.

Return type:

Tensor (float) or Tuple(Tensor, Tensor)

plot(idx=0, img: Tensor | ndarray | None = None, plot_cost: bool = True, plot_match: bool = True, max_background_match: float | int = 1.0, background: bool = True, erase: bool = False)

Plots from the last batch # TODO: extensive description

Parameters:

idx (int, optional) – Index of the image to be plotted.
img (Tensor or ndarray, optional) – Image to be plotted. If it is not None, the boxes plot is computed.
plot_cost (bool, optional) – Plots the cost matrix between the predictions and the targets, including background.
plot_match (bool, optional) – Plots the cost matrix between the predictions and the targets, including background.
max_background_match (float, optional) – A threshold to only plot relevant matched predictions. The predictions are only plotted if the value matched to the background does not exceed max_background_match. Defaults to 1.

Returns:

Matplotlib figures

Return type:

Tuple(fig, fig, fig)

Example

import uotod
from uotod.sample import input, target, imgs

L = uotod.loss.GIoULoss(reduction='none')
H = uotod.match.Hungarian(loc_match_module=L, background_cost=0.8)
H(input, target)

fig_img, fig_cost, fig_match = H.plot(idx=0, img=imgs)
fig_match.show()
fig_img.show()
fig_cost.show()

(Source code)

../_images/hungarian_00.png — (`png`, `hires.png`, `pdf`)

../_images/hungarian_01.png — (`png`, `hires.png`, `pdf`)

../_images/hungarian_02.png — (`png`, `hires.png`, `pdf`)