Balanced Sinkhorn

The Optimal Transport problem aims at minimizing the transport cost between a source distribution (the predictions) and a target distribution [Kan58, Mon81, Vil09]. This particular problem solves a regularization of the original problem with Sinkhorn’s algorithm, well suited for GPU parallelization [Cut13, PeyreC+19]. In the limit of reg (or reg_dimless) to 0, this becomes the Hungarian algorithm. We refer to Proposition 1 of [DPDPS+23] for further information. By consequence, this matching can be seen as a regularized version of the Hungarian one.

We consider the matching cost \(\mathcal{L}_{\text{match}}\) = cls_match_module + loc_match_module between the \(N_p\) predictions \(\hat{\mathbf{y}}_i\) and \(N_t\) targets \(\mathbf{y}_j\). In particular, the cost of the background \(\mathbf{y}_{N_t+1} = \varnothing\) is given by \(\mathcal{L}_{\text{match}}\left(\hat{\mathbf{y}}_i, \varnothing\right)\) = bg_cost.

For each element in the batch, the following problem is solved and the match \(\mathbf{P}\) is retrieved.

\begin{align} N_p * \underset{\mathbf{P}\in \mathbb{R}^{N_p \times N_t+1}}{\mathrm{arg\,min}} &\sum_{i,j=1}^{N_p,N_t+1} P_{i,j}\mathcal{L}_{\text{match}}\left(\hat{\mathbf{y}}_i, \mathbf{y}_j\right) - \mathtt{reg} * \,\mathrm{H}(\mathbf{P}), &\\ \mathrm{s.t.} & \sum_{j=1}^{N_t+1} P_{i,j} = 1/N_p, & \forall\; 0 \leq i \leq N_p \;\text{(predictions)},\\ & \sum_{i=1}^{N_p} P_{i,j} = 1/N_p, & \forall\; 0 \leq j \leq N_t \;\text{(targets)},\\ & \sum_{i=1}^{N_p} P_{i,j} = (N_p - N_t)/N_p, & j = N_t+1\;\text{(background)}. \end{align}

with \(\mathrm{H}: \Delta^{N \times M} \rightarrow \mathbb{R}_{\geq 0} : \mathbf{P} \mapsto -\sum_{i,j} P_{i,j}(\log(P_{i,j})-1)\) the entropy of the match \(\mathbf{P}\), with \(0 \ln(0) = 0\) by definition.

In the particular case where no background is used, the problem remains the same but the last column of \(\mathbf{P}\) is just unnused.

Warning

If the formulation converges to the Hungarian algorithm in the limit of reg (or reg_dimless) to 0, it becomes more and more unstable if solved using Sinkhorn’s algorithm. We encourage to use uotod.match.Hungarian if no regularization at all is explicitly seeked, at the cost of loosing parallelization.

Sinkhorn’s algorithm is used to solve the problem. It is however not run until convergence, but with a fixed number of iterations. A compiled version is also available through the boolean compiled. The fixed number of iterations (and the non-verification of convergence) has the advantage of boosting the computation time and taking a vast advantage of the Tensor parallelization capabilities of modern GPUs. If a more stable or defensive, but also slower implementation is required, we refer to the uotod.match.BalancedPOT class.

Class

class uotod.match.BalancedSinkhorn(**kwargs)

Parameters:

compiled (bool, optional) – Indicates whether to use a compiled version of the algorithm or not. Defaults to False.
num_iter (int, optional) – Fixed number of iterations in Sinkhorn’s algorithm. Defaults to 50 (can easily be lowered without hindering the global convergence).
normalize_cost_matrix (bool, optional) – Normalizes the cost matrix, defaults to True.
reg_dimless (float, optional) – Dimensionless regularization parameter for the OT algorithm. Defaults to 0.12. This argument automatically sets the regularization reg depending on the problem size if the latter is not set. We refer to [DPDPS+23] for more information.
reg (float, optional) – Regularization parameter for the OT algorithm. Defaults to None. It is dependent of the problem size and we recommend leaving it blank and using the argument reg_dimless instead.
cls_match_module (_Loss) – Classification loss used to compute the matching, if any.
loc_match_module (_Loss) – Localization loss used to compute the matching, if any.
background (bool, optional) – Indicates whether there is a background. Defaults to True.
background_cost (float, optional) – Cost of the background class. Defaults to 10.
is_anchor_based (bool, optional) – If True, the matching is performed between the anchor boxes and the target boxes.

property compiled: bool: Boolean indicating whether to use a compiled version of the algorithm or not.

compute_cost_matrix(input: Dict[str, Tensor] | List[Dict[str, Tensor]], target: Dict[str, Tensor] | List[Dict[str, Tensor]], anchors: Tensor | None = None) → Tensor

Computes a batch of cost matrices between the predicted and target boxes.

Parameters:

input (dictionary) – Input containing the predicted logits and boxes. “pred_logits”: Tensor of shape (batch_size, num_pred, num_classes). “pred_boxes”: Tensor of shape (batch_size, num_pred, 4), where the last dimension is (x1, y1, x2, y2).
target (dictionary) – Target containing the target classes, boxes and mask. “labels”: Tensor of shape (batch_size, num_targets). “boxes”: Tensor of shape (batch_size, num_targets, 4), where the last dimension is (x1, y1, x2, y2). “mask”: Tensor of shape (batch_size, num_targets).
anchors (Tensor) – the anchors used to compute the predicted boxes. (batch_size, num_pred, 4), where the last dimension is (x1, y1, x2, y2).
background (bool, optional) – Indicated whether the background has to be added.

Returns:

the matching between the predicted and target boxes: Tensor of shape (batch_size, num_pred, num_targets + 1) or (batch_size, num_pred, num_targets) if background is False.

Return type:

Tensor (float)

compute_matching(cost_matrix: Tensor, target_mask: Tensor) → Tensor

Computes the matching between the predicted and target boxes. The optimal transport problem is solved using the Sinkhorn algorithm. :param cost_matrix: the cost matrix. Tensor of shape (batch_size, num_pred, num_tgt + 1). :param target_mask: the target mask. Tensor of shape (batch_size, num_tgt). :return: the matching. Tensor of shape (batch_size, num_pred, num_tgt + 1). The last entry of the last dimension is the background.

Computes the matching.

Parameters:

cost_matrix (Tensor) – Cost matrix of shape (batch_size, num_pred, num_targets + 1).
target_mask (BoolTensor, optional) – Target mask of shape (batch_size, num_targets).

Returns:

The matching \(\mathbf{P}\) for each element of the batch. Tensor of shape (batch_size, num_pred, num_targets + 1). The last entry of the last dimension [:, :, num_target+1] is the background.

Computes a batch of matchings between the predicted and target boxes.

Parameters:

input (dictionary) – Input containing the predicted logits and boxes. “pred_logits”: Tensor of shape (batch_size, num_pred, num_classes). “pred_boxes”: Tensor of shape (batch_size, num_pred, 4), where the last dimension is (x1, y1, x2, y2).
target (dictionary) – Target containing the target classes, boxes and mask. “labels”: Tensor of shape (batch_size, num_targets). “boxes”: Tensor of shape (batch_size, num_targets, 4), where the last dimension is (x1, y1, x2, y2). “mask”: Tensor of shape (batch_size, num_targets).
anchors (Tensor) – the anchors used to compute the predicted boxes. (batch_size, num_pred, 4), where the last dimension is (x1, y1, x2, y2).

Returns:

the matching between the predicted and target boxes, and the cost matrix if returns_cost is True: Tensor of shape (batch_size, num_pred, num_targets + 1). The last entry of the last dimension is the background.

Return type:

Tensor (float) or Tuple(Tensor, Tensor)

property num_iter: int: Number of iterations in Sinkhorn’s algorithm

plot(idx=0, img: Tensor | ndarray | None = None, plot_cost: bool = True, plot_match: bool = True, max_background_match: float | int = 1.0, background: bool = True, erase: bool = False)

Plots from the last batch # TODO: extensive description

Parameters:

idx (int, optional) – Index of the image to be plotted.
img (Tensor or ndarray, optional) – Image to be plotted. If it is not None, the boxes plot is computed.
plot_cost (bool, optional) – Plots the cost matrix between the predictions and the targets, including background.
plot_match (bool, optional) – Plots the cost matrix between the predictions and the targets, including background.
max_background_match (float, optional) – A threshold to only plot relevant matched predictions. The predictions are only plotted if the value matched to the background does not exceed max_background_match. Defaults to 1.

Returns:

Matplotlib figures

Return type:

Tuple(fig, fig, fig)

Example

import uotod
from uotod.sample import input, target, imgs

L = uotod.loss.GIoULoss(reduction='none')
H = uotod.match.BalancedSinkhorn(loc_match_module=L, background_cost=0.8)
H(input, target)

fig_img, fig_cost, fig_match = H.plot(idx=0, img=imgs)
fig_img.show()
fig_cost.show()
fig_match.show()

(Source code)

../_images/balanced_00.png — (`png`, `hires.png`, `pdf`)

../_images/balanced_01.png — (`png`, `hires.png`, `pdf`)

../_images/balanced_02.png — (`png`, `hires.png`, `pdf`)