Unbalanced Sinkhorn from the Python Optimal Transport Library

We consider the matching cost $\mathcal{L}_{\text{match}}$ = cls_match_module + loc_match_module between the $N_p$ predictions $\hat{\mathbf{y}}_i$ and $N_t$ targets $\mathbf{y}_j$. In particular, the cost of the background $\mathbf{y}_{N_t+1} = \varnothing$ is given by $\mathcal{L}_{\text{match}}\left(\hat{\mathbf{y}}_i, \varnothing\right)$ = bg_cost.

\[\begin{split}N_p * \underset{\mathbf{P} \in \mathbb{R}_{\geq 0}^{N_p\times N_t+1}}{\mathrm{arg\, min}} \bigg\{ \mathtt{reg}*\mathrm{KL}\left(\mathbf{P}|\mathbf{K}_{\mathtt{reg}}\right)\, + &\,\mathtt{reg\_pred\_target}*\mathrm{KL}\left(\mathbf{P}\mathbf{1}_{N_t+1}|\mathbf{\alpha}\right) \\ + &\,\mathtt{reg\_pred\_target}*\mathrm{KL}\left(\mathbf{1}_{N_p}^\top \mathbf{P}|\mathbf{\beta}\right)\bigg\},\end{split}\]

where $\mathrm{KL}: \mathbb{R}^{N\times M}_{\geq 0} \times \mathbb{R}^{N\times M}_{> 0} \rightarrow \mathbb{R}_{\geq 0}^{\phantom{N}} : (\mathbf{U},\mathbf{V}) \mapsto \sum_{i,j=1}^{N \times M} U_{i,j} \log(U_{i,j} / V_{i,j}) - U_{i,j} + V_{i,j}$ is the Kullback-Leibler divergence – also called relative entropy – between matrices or vectors when $M=1$, with $0 \ln (0) = 0$ by definition. The Gibbs kernel $\mathbf{K}_{\mathtt{reg}}$ is given by $\left(K_{\mathtt{reg}}\right)_{i,j} = \exp\left(- \mathcal{L}_{\text{match}}\left(\hat{\mathbf{y}}_i, \mathbf{y}_j \right)/ \mathtt{reg} \right)$.

The marginals for the soft constraints are given by

\begin{eqnarray} \mathbf{\alpha} &=& \frac{1}{N_p}[\; \underbrace{1, \; 1, \;1,\; \ldots, \; 1}_{\text{$N_p$ predictions}}\;], \\ \mathbf{\beta} &=& \frac{1}{N_p}[\; \underbrace{1, \; 1, \; \ldots, \; 1}_{\text{$N_t$ targets}}, \; \underbrace{(N_p-N_t)}_{\text{background }\varnothing} \;]. \end{eqnarray}

In the particular case where no background is used, the problem remains the same but the last column of $\mathbf{P}$ is just unnused.

Note

Unlike uotod.match.UnbalancedSinkhorn, the same regularization parameter is used for both the predictions and the targets.

Warning

If the formulation converges to the Hungarian algorithm in the limit of reg (or reg_dimless) to 0, it becomes more and more unstable if solved using Sinkhorn’s algorithm.

Class

class uotod.match.UnbalancedPOT(**kwargs)

Parameters:

method
reg_pred_target (float, optional) – Defaults to 1.
normalize_cost_matrix (bool, optional) – Normalizes the cost matrix, defaults to True.
reg_dimless (float, optional) – Dimensionless regularization parameter for the OT algorithm. Defaults to 0.12. This argument automatically sets the regularization reg depending on the problem size if the latter is not set. We refer to [DPDPS+23] for more information.
reg (float, optional) – Regularization parameter for the OT algorithm. Defaults to None. It is dependent of the problem size and we recommend leaving it blank and using the argument reg_dimless instead.
cls_match_module (_Loss) – Classification loss used to compute the matching, if any.
loc_match_module (_Loss) – Localization loss used to compute the matching, if any.
background (bool, optional) – Indicates whether there is a background. Defaults to True.
background_cost (float, optional) – Cost of the background class. Defaults to 10.
is_anchor_based (bool, optional) – If True, the matching is performed between the anchor boxes and the target boxes.

available_methods = ['sinkhorn', 'sinkhorn_stabilized', 'sinkhorn_reg_scaling']

compute_cost_matrix(input: Dict[str, Tensor] | List[Dict[str, Tensor]], target: Dict[str, Tensor] | List[Dict[str, Tensor]], anchors: Tensor | None = None) → Tensor

Computes a batch of cost matrices between the predicted and target boxes.

Parameters:

input (dictionary) – Input containing the predicted logits and boxes. “pred_logits”: Tensor of shape (batch_size, num_pred, num_classes). “pred_boxes”: Tensor of shape (batch_size, num_pred, 4), where the last dimension is (x1, y1, x2, y2).
target (dictionary) – Target containing the target classes, boxes and mask. “labels”: Tensor of shape (batch_size, num_targets). “boxes”: Tensor of shape (batch_size, num_targets, 4), where the last dimension is (x1, y1, x2, y2). “mask”: Tensor of shape (batch_size, num_targets).
anchors (Tensor) – the anchors used to compute the predicted boxes. (batch_size, num_pred, 4), where the last dimension is (x1, y1, x2, y2).
background (bool, optional) – Indicated whether the background has to be added.

Returns:

the matching between the predicted and target boxes: Tensor of shape (batch_size, num_pred, num_targets + 1) or (batch_size, num_pred, num_targets) if background is False.

Return type:

Tensor (float)

compute_matching(cost_matrix: Tensor, target_mask: Tensor) → Tensor

Computes the matching between the predicted and target boxes. The optimal transport problem is solved using the Sinkhorn algorithm. :param cost_matrix: the cost matrix. Tensor of shape (batch_size, num_pred, num_tgt + 1). :param target_mask: the target mask. Tensor of shape (batch_size, num_tgt). :return: the matching. Tensor of shape (batch_size, num_pred, num_tgt + 1). The last entry of the last dimension is the background.

Computes the matching.

Parameters:

cost_matrix (Tensor) – Cost matrix of shape (batch_size, num_pred, num_targets + 1).
target_mask (BoolTensor, optional) – Target mask of shape (batch_size, num_targets).

Returns:

The matching $\mathbf{P}$ for each element of the batch. Tensor of shape (batch_size, num_pred, num_targets + 1). The last entry of the last dimension [:, :, num_target+1] is the background.

Computes a batch of matchings between the predicted and target boxes.

Parameters:

input (dictionary) – Input containing the predicted logits and boxes. “pred_logits”: Tensor of shape (batch_size, num_pred, num_classes). “pred_boxes”: Tensor of shape (batch_size, num_pred, 4), where the last dimension is (x1, y1, x2, y2).
target (dictionary) – Target containing the target classes, boxes and mask. “labels”: Tensor of shape (batch_size, num_targets). “boxes”: Tensor of shape (batch_size, num_targets, 4), where the last dimension is (x1, y1, x2, y2). “mask”: Tensor of shape (batch_size, num_targets).
anchors (Tensor) – the anchors used to compute the predicted boxes. (batch_size, num_pred, 4), where the last dimension is (x1, y1, x2, y2).

Returns:

the matching between the predicted and target boxes, and the cost matrix if returns_cost is True: Tensor of shape (batch_size, num_pred, num_targets + 1). The last entry of the last dimension is the background.

Return type:

Tensor (float) or Tuple(Tensor, Tensor)

plot(idx=0, img: Tensor | ndarray | None = None, plot_cost: bool = True, plot_match: bool = True, max_background_match: float | int = 1.0, background: bool = True, erase: bool = False)

Plots from the last batch # TODO: extensive description

Parameters:

idx (int, optional) – Index of the image to be plotted.
img (Tensor or ndarray, optional) – Image to be plotted. If it is not None, the boxes plot is computed.
plot_cost (bool, optional) – Plots the cost matrix between the predictions and the targets, including background.
plot_match (bool, optional) – Plots the cost matrix between the predictions and the targets, including background.
max_background_match (float, optional) – A threshold to only plot relevant matched predictions. The predictions are only plotted if the value matched to the background does not exceed max_background_match. Defaults to 1.

Returns:

Matplotlib figures

Return type:

Tuple(fig, fig, fig)

Available Methods

Value	Description	Extra Optional Arguments	Reference