Unbalanced Sinkhorn

We consider the matching cost $\mathcal{L}_{\text{match}}$ = cls_match_module + loc_match_module between the $N_p$ predictions $\hat{\mathbf{y}}_i$ and $N_t$ targets $\mathbf{y}_j$. In particular, the cost of the background $\mathbf{y}_{N_t+1} = \varnothing$ is given by $\mathcal{L}_{\text{match}}\left(\hat{\mathbf{y}}_i, \varnothing\right)$ = bg_cost.

\[\begin{split}N_p * \underset{\mathbf{P} \in \mathbb{R}_{\geq 0}^{N_p\times N_t+1}}{\mathrm{arg\, min}} \bigg\{ \mathtt{reg}*\mathrm{KL}\left(\mathbf{P}|\mathbf{K}_{\mathtt{reg}}\right)\, + &\,\mathtt{reg\_pred}*\mathrm{KL}\left(\mathbf{P}\mathbf{1}_{N_t+1}|\mathbf{\alpha}\right) \\ + &\,\mathtt{reg\_target}*\mathrm{KL}\left(\mathbf{1}_{N_p}^\top \mathbf{P}|\mathbf{\beta}\right)\bigg\},\end{split}\]

where $\mathrm{KL}: \mathbb{R}^{N\times M}_{\geq 0} \times \mathbb{R}^{N\times M}_{> 0} \rightarrow \mathbb{R}_{\geq 0}^{\phantom{N}} : (\mathbf{U},\mathbf{V}) \mapsto \sum_{i,j=1}^{N \times M} U_{i,j} \log(U_{i,j} / V_{i,j}) - U_{i,j} + V_{i,j}$ is the Kullback-Leibler divergence – also called relative entropy – between matrices or vectors when $M=1$, with $0 \ln (0) = 0$ by definition. The Gibbs kernel $\mathbf{K}_{\mathtt{reg}}$ is given by $\left(K_{\mathtt{reg}}\right)_{i,j} = \exp\left(- \mathcal{L}_{\text{match}}\left(\hat{\mathbf{y}}_i, \mathbf{y}_j \right)/ \mathtt{reg} \right)$.

The marginals for the soft constraints are given by

\begin{eqnarray} \mathbf{\alpha} &=& \frac{1}{N_p}[\; \underbrace{1, \; 1, \;1,\; \ldots, \; 1}_{\text{$N_p$ predictions}}\;], \\ \mathbf{\beta} &=& \frac{1}{N_p}[\; \underbrace{1, \; 1, \; \ldots, \; 1}_{\text{$N_t$ targets}}, \; \underbrace{(N_p-N_t)}_{\text{background }\varnothing} \;]. \end{eqnarray}

In the particular case where no background is used, the problem remains the same but the last column of $\mathbf{P}$ is just unnused.

Warning

If the formulation converges to the Hungarian algorithm in the limit of reg (or reg_dimless) to 0, it becomes more and more unstable if solved using Sinkhorn’s algorithm.

A variation of Sinkhorn’s algorithm is used to solve the problem. It is however not run until convergence, but with a fixed number of iterations. A compiled version is also available through the boolean compiled. The fixed number of iterations (and the non-verification of convergence) has the advantage of boosting the computation time and taking a vast advantage of the Tensor parallelization capabilities of modern GPUs. If a more stable or defensive, but also slower implementation is required, we refer to the uotod.match.UnbalancedPOT class. It however uses the same parameter for both soft constraints. In other words, it only works for the cases where reg_pred = reg_target.

Class

class uotod.match.UnbalancedSinkhorn(**kwargs)

Parameters:

reg_pred (float, optional) – Prediction constraint regularization parameter for the OT algorithm. Defaults to 1.0.
reg_target (float, optional) – Ground truth constraint regularization parameter for the OT algorithm. Defaults to 1.0.
compiled (bool, optional) – Indicates whether to use a compiled version of the algorithm or not. Defaults to False.
num_iter (int, optional) – Fixed number of iterations in Sinkhorn’s algorithm. Defaults to 50 (can easily be lowered without hindering the global convergence).
normalize_cost_matrix (bool, optional) – Normalizes the cost matrix, defaults to True.
reg_dimless (float, optional) – Dimensionless regularization parameter for the OT algorithm. Defaults to 0.12. This argument automatically sets the regularization reg depending on the problem size if the latter is not set. We refer to [DPDPS+23] for more information.
reg (float, optional) – Regularization parameter for the OT algorithm. Defaults to None. It is dependent of the problem size and we recommend leaving it blank and using the argument reg_dimless instead.
cls_match_module (_Loss) – Classification loss used to compute the matching, if any.
loc_match_module (_Loss) – Localization loss used to compute the matching, if any.
background (bool, optional) – Indicates whether there is a background. Defaults to True.
background_cost (float, optional) – Cost of the background class. Defaults to 10.
is_anchor_based (bool, optional) – If True, the matching is performed between the anchor boxes and the target boxes.

property compiled: bool: Boolean indicating whether to use a compiled version of the algorithm or not.

compute_cost_matrix(input: Dict[str, Tensor] | List[Dict[str, Tensor]], target: Dict[str, Tensor] | List[Dict[str, Tensor]], anchors: Tensor | None = None) → Tensor

Computes a batch of cost matrices between the predicted and target boxes.

Parameters:

input (dictionary) – Input containing the predicted logits and boxes. “pred_logits”: Tensor of shape (batch_size, num_pred, num_classes). “pred_boxes”: Tensor of shape (batch_size, num_pred, 4), where the last dimension is (x1, y1, x2, y2).
target (dictionary) – Target containing the target classes, boxes and mask. “labels”: Tensor of shape (batch_size, num_targets). “boxes”: Tensor of shape (batch_size, num_targets, 4), where the last dimension is (x1, y1, x2, y2). “mask”: Tensor of shape (batch_size, num_targets).
anchors (Tensor) – the anchors used to compute the predicted boxes. (batch_size, num_pred, 4), where the last dimension is (x1, y1, x2, y2).
background (bool, optional) – Indicated whether the background has to be added.

Returns:

the matching between the predicted and target boxes: Tensor of shape (batch_size, num_pred, num_targets + 1) or (batch_size, num_pred, num_targets) if background is False.

Return type:

Tensor (float)

compute_matching(cost_matrix: Tensor, target_mask: Tensor) → Tensor

Computes the matching between the predicted and target boxes. The optimal transport problem is solved using the Sinkhorn algorithm. :param cost_matrix: the cost matrix. Tensor of shape (batch_size, num_pred, num_tgt + 1). :param target_mask: the target mask. Tensor of shape (batch_size, num_tgt). :return: the matching. Tensor of shape (batch_size, num_pred, num_tgt + 1). The last entry of the last dimension is the background.

Computes the matching.

Parameters:

cost_matrix (Tensor) – Cost matrix of shape (batch_size, num_pred, num_targets + 1).
target_mask (BoolTensor, optional) – Target mask of shape (batch_size, num_targets).

Returns:

The matching $\mathbf{P}$ for each element of the batch. Tensor of shape (batch_size, num_pred, num_targets + 1). The last entry of the last dimension [:, :, num_target+1] is the background.

Computes a batch of matchings between the predicted and target boxes.

Parameters:

input (dictionary) – Input containing the predicted logits and boxes. “pred_logits”: Tensor of shape (batch_size, num_pred, num_classes). “pred_boxes”: Tensor of shape (batch_size, num_pred, 4), where the last dimension is (x1, y1, x2, y2).
target (dictionary) – Target containing the target classes, boxes and mask. “labels”: Tensor of shape (batch_size, num_targets). “boxes”: Tensor of shape (batch_size, num_targets, 4), where the last dimension is (x1, y1, x2, y2). “mask”: Tensor of shape (batch_size, num_targets).
anchors (Tensor) – the anchors used to compute the predicted boxes. (batch_size, num_pred, 4), where the last dimension is (x1, y1, x2, y2).

Returns:

the matching between the predicted and target boxes, and the cost matrix if returns_cost is True: Tensor of shape (batch_size, num_pred, num_targets + 1). The last entry of the last dimension is the background.

Return type:

Tensor (float) or Tuple(Tensor, Tensor)

property num_iter: int: Number of iterations in Sinkhorn’s algorithm

plot(idx=0, img: Tensor | ndarray | None = None, plot_cost: bool = True, plot_match: bool = True, max_background_match: float | int = 1.0, background: bool = True, erase: bool = False)

Plots from the last batch # TODO: extensive description

Parameters:

idx (int, optional) – Index of the image to be plotted.
img (Tensor or ndarray, optional) – Image to be plotted. If it is not None, the boxes plot is computed.
plot_cost (bool, optional) – Plots the cost matrix between the predictions and the targets, including background.
plot_match (bool, optional) – Plots the cost matrix between the predictions and the targets, including background.
max_background_match (float, optional) – A threshold to only plot relevant matched predictions. The predictions are only plotted if the value matched to the background does not exceed max_background_match. Defaults to 1.

Returns:

Matplotlib figures

Return type:

Tuple(fig, fig, fig)

Example

The following shows the limit cases of Unbalanced Sinkhorn. As each limit case comes with its own algorithm, it is preferrable to use that one for efficiency. They are provided here as an illustration to better understand the role of each parameter. The goal of this algorithm is compute hybrid cases between the limit cases that are unattainable otherwise.

import uotod
from uotod.sample import input, target, imgs

L = uotod.loss.GIoULoss(reduction='none')
H = uotod.match.UnbalancedSinkhorn(loc_match_module=L, background_cost=0.8)
H(input, target)

fig_img, fig_cost, fig_match = H.plot(idx=0, img=imgs)
fig_img.show()
fig_cost.show()
fig_match.show()

(Source code)

../_images/unbalanced_00.png — (`png`, `hires.png`, `pdf`)

../_images/unbalanced_01.png — (`png`, `hires.png`, `pdf`)

../_images/unbalanced_02.png — (`png`, `hires.png`, `pdf`)

Unbalanced Sinkhorn

Class

Example

Limit Cases